Mapping

Introduction

PPM topologies implicitly define a data-to-processor assignment. Mapping routines provide the functionality of sending particles and field blocks to the proper processor, i.e. the one that 'owns' the corresponding sub-domain(s) of the computational space. Three different mapping types are provided for both particles and field data:

global mapping which involves an all-to-all communication
local mapping for neighborhood communication
ghost mappings to update the ghost layers.

In addition, a special ring shift mapping is provided for particle data on the ring topology, and a connection mapping is provided for taking into account links between particles (e.g. chemical bonds, triangular surface elements, unstructured mesh elements). All mapping types are organized as stacks. A mapping operation consists of 4 steps:

defining the mapping
pushing data onto the send stack,
performing the actual send and receive operations using MPI
popping the data from the receive stack.

This architecture allows data stored in different arrays to be sent together to minimize network latency, and mapping definitions to be re-used by repeatedly calling the push/send/pop sequence for the same persistent mapping definition. The individual mapping types only differ in their definition step, while push, send, and pop are identical.

All mapping subroutines of PPM are available in separate optimized versions for scalar and vector data. Supported data types for particles and fields are: single and double precision floating point, single and double precision complex numbers, integer numbers, and logical values. Different data types can be mixed within the same stack, in which case they are converted to the stack data type as defined by the PPM initialization routine.

Mappings of field data can be masked. An optional binary mask selects which mesh points are to be mapped and which ones are not. The values of non-mapped points will remain unaffected by the mapping operation.

Global mapping

The global mapping in the PPM library involves communication of all processors with all others. Individual communication steps are synchronous and scheduled such that no conflicts occur. Thus, all processors send to i + r (mod Nproc) while they receive from i − r (mod Nproc). This is repeated for all .

This mapping type is used to perform the initial data-to-processor assignment or to switch from the current topology to another one. The definition of a global mapping involves the creation of an index list of particles or mesh blocks that fall outside the local processor and a list of their new processor affiliation.

As an example, the global mapping of Np particles with the locations xp (ndims x Np floating point array) and vector properties wp (nprop x Np floating point array) consists of the following sequence of routine calls:

CALL ppm_map_part(xp,ndims,Np,Mp,tid,ppm_param_map_global,info)
CALL ppm_map_part(wp,nprop,Np,Mp,tid,ppm_param_map_push  ,info)
CALL ppm_map_part(wp,nprop,Np,Mp,tid,ppm_param_map_send  ,info)
CALL ppm_map_part(wp,nprop,Np,Mp,tid,ppm_param_map_pop   ,info)
CALL ppm_map_part(xp,ndims,Np,Mp,tid,ppm_param_map_pop   ,info).

Np is the number of non-ghost particles on the local processor before the mapping, Mp is the new number of particles after the mapping, and tid is the unique identification number of the target topology for the mapping. This target topology needs to be defined earlier using the PPM topology creation routine. The error status of each step is returned in info. This status argument is present in all PPM routines. For the scalar version of the mapping routines, the second argument is absent and the first one is a rank one array. For reasons of efficiency, the first routing call not only defines the mapping (i.e.~creates all the lists), but also directly pushes the particle locations onto the send stack. The pop action thus needs to be issued once more than the push action.

Mapping an ndim-dimensional vector field fld, that is defined on the mesh mid of the current topology, requires the following sequence of routine calls:

CALL ppm_map_field(fld,ndim,tid,mid,to_mid,gs,ppm_param_map_global,info)
CALL ppm_map_field(fld,ndim,tid,mid,to_mid,gs,ppm_param_map_push  ,info)
CALL ppm_map_field(fld,ndim,tid,mid,to_mid,gs,ppm_param_map_send  ,info)
CALL ppm_map_field(fld,ndim,tid,mid,to_mid,gs,ppm_param_map_pop   ,info).

mid identifies the target mesh, which needs to be defined on the target topology tid. The width of the ghost layer in terms of the number of mesh points is given in gs for each spatial direction. The scalar version of the routine lacks the second argument and the rank of the field array fld is reduced by one.

Partial mapping

During a partial mapping, each processor only communicates with those processors that have sub-domains that are adjacent to any of its own sub-domains. The optimal communication sequence would satisfy that no conflicts occur and the minimum number of communication steps is needed. The problem can be formulated in a graph representation where each processor is a node of the graph. An edge in the graph denotes a ``neighborhood relation, i.e. a necessary communication. The goal is to find a coloring of the edges such that no two edges of the same color meet in any node, and such that the minimum number of different colors (corresponding to communication steps) is needed. This minimum edge coloring problem is -complete (Holyer, 1981). An efficient approximate solution can be found using the algorithm of Vizing (Vizing, 1964). This solution guarantees that at most one color more than the minimum number is used (Caprara, 1998). PPM uses this algorithm to pre-compute and store the communication protocol for each defined topology.

The partial mapping is mainly used to account for particle motion during a simulation. Periodic boundary conditions at the outer faces of the computational domain are automatically taken into account. Defining a partial mapping is similar to defining a global mapping, except that the parameter ppm_param_map_partial is used in the first call to the corresponding mapping routine.

Receiving ghosts

Sub-domains are extended by a layer of ghost mesh points, ghost particles, or both. These ghosts are copies of true mesh points or particles that either reside on a neighboring processor or account for periodic boundary conditions. Ghost particles or mesh points are needed for all local operations such as finite support particle-particle interactions, finite difference stencils, or particle-mesh interpolations. PPM provides mapping routines (ppm_map_part_ghost} and (ppm_map_field_ghost) for handling such ghost layers. Receiving ghosts and obtaining the copied values is done using the ppm_param_map_ghost_get parameter. For particles, ghosts can have both a position and values. Notice that in the case of stationary particles, the stack architecture of PPM's mapping allows to update the ghost values without re-defining the lists or re-sending ghost positions.

When performing symmetric particle-particle interactions or using particle-to-mesh interpolation, the value at the location of a ghost may change. This gives rise to ghost contributions to the corresponding real particle or mesh point on the source processor. In order to add these contributions back onto the proper real element, PPM provides a ghost sending mechanism (ppm_param_map_ghost_put). The library keeps track of which real element is the pre-image of what ghosts and o f possible periodic images in the case of periodic boundary conditions.

Ring topology

When the PPM ring topology is used to compute direct interactions or to distribute data of not globally known processor affiliation, the ring shift mapping is used. In this mapping, each processor keeps a local copy of its data-set while a second copy is sent around the ring. This means that processor i receives a data-set from processor i-1 (mod Nproc) while sending its previous set to i+1 (mod Nproc). The ring shift mapping performs one such step upon each call. For the traveling set to visit all processors, the ring shift has to be executed Nproc − 1 times. After each ring shift, each processor can perform certain operations using its original local data-set as well as the current traveling set.

Particle connections

To allow disjoint initialization or input of particles and the corresponding connections, the latter are typically initialized or read separately and are not sorted by processor. The PPM connection mapping is provided to properly distribute the connections among the processors. The ring topology is conveniently used for this mapping, in which each processor picks those connections from the data being transmitted on the ring who have entries that correspond to one of its particles.

After the connection mapping, every processor will have those connections that are associated with any of its particles. This allows for a non-symmetric calculation of the interactions. If symmetry is used then only one processor should perform the calculation. The mapping is thus followed by a pruning of the connections to assure that only the processor that has all the member particles of the connection will keep it. This is a sufficient condition since the ghost layers in PPM are non-overlapping.

As the particles move to other processors during the course of the simulation, the connections are updated according to the partial mapping of the particles. In the case of symmetric interaction evaluations, the mapping is followed by the pruning step as described above to remove redundant connections.

CSElab