These are ideas that I heard from Livermore after I left Livermore. They bear on multi processing to do physics. I embellish them a bit. Once (2+1)D problems predominated and I will stick to them here—the (3+1)D algorithms are not too much different for the purposes of this note. A (2+1)D problem involves two spatial dimensions and one time dimension. The calculation proceeds just as the simulated system proceeds—simulated time advances as the calculation proceeds.
In simple cases the 2D mesh is fixed, at least for periods of time. It may be rectangular or irregular. If it changes it probably ceases to be rectangular. I will assume an irregular 2D mesh here. There are from 10 to perhaps 30 floating point numbers per zone. If the mesh is irregular there are additional integers or pointers that determine the zone’s neighbors. Sophisticated code may use storage contiguity to supplant some of the mesh topology information, just as OpenGL supports regular sets of triangles within a larger irregular set of triangles.
The 2D mesh is divided into zones and there is a symmetric relation between zones called N, (for neighbor). Zones are allocated to processors each with their own memory. This allocation attempts to put neighbors in the same processor but this cannot be done completely. Zones with neighbors in other processors are called boundary zones here. Two processors are neighbors if they have been allocated neighboring zones. A particular processor is likely to have about 6 neighbors. In (3+1)D physics it is more like 12 neighboring processors. Hopefully most zones are not boundary zones.
Heat flow is a real but atypically simple physics mesh problem. Here I describe explicit and implicit heat flow calculations. The above page does not address MP however. I will talk here about explicit calculations first. As the above page describes, in one time step a new zone state depends on only its own current state and those of its neighbors. In one moderately realistic mesh application it took about 5000 machine instructions to do this calculation for one time step and one zone.
The obvious problem is reading the state of a neighbor when it is not in the same processor. The new idea alluded to above is to begin each time step with the states of all neighbors of all the processor’s zones stored on that processor, storage contiguity respected. This is done for each processor. Now each zone’s state can be computed and there is nothing to do for each processor to send the state of boundary zones to the neighboring processors. Note that while these messages travel new states can be computed for zones whose neighbors are not boundary zones.
Chuck Leith used this pattern in about 1987 to program the BBN Butterfly MP system (TC-2000).
The above pattern can be captured to a degree as a body of source code but the interface between physics code and pattern code is very intimate!
The implicit version of this plan is a good deal more complex. There is one variable per zone. (The variable may be a floating value for simple physic, or a vector or tensor for more complex physics.) The implicit solution proceeds by eliminating variables in the sense of high school algebra. It turns out that each processor can eliminate variables for its non-boundary zones and each processor proceeds independently of the others. The result is a linear equation for each boundary zone which involves the other boundary zones in the same processor. Then it gets difficult. Processors can form neighborhood cliques and eliminate variables on their shared boundaries. Depending on the physics it may be that many terms in these linear equations drop out for being insignificant. These cliques have boundaries themselves and the growing set of equations must be solved. The worst case is that the number of terms in each equation grows as size of the perimeter of the clique. When all of the variables have been eliminated the back substitution may proceed in exactly the reverse order of the elimination whereupon the sought values are available and located where they are needed.
The above is obviously not a program or even a program design, but it is a suggestion of an organization. I suspect that for efficiency it is usually necessary to fill in the program details from knowledge of how fast the physics effects traverse the mesh and in what directions.