This week I listened to two talks on the outlook for computer architecture. Dave Patterson described Ramp which is a Berkeley initiative and William Dally presented Merrimac which is a Stanford plan. There is emphatic agreement that massively parallel is the next big thing. Merrimac presumes that software (rather than hardware cache logic) will manage data flow thru a fractal structure of memory, busses and processors. (There was a decade or so where the fastest computers eschewed caches. Seymour built them.) They claim to have ideas on how software can manage this and I have not yet found this information. I do not know whether RAMP is still agnostic on this issue. The most interesting idea I heard from Patterson was of dwarfs which are 14 categories of application assorted according to their susceptibility to distribution. I think these categories may play a useful role in guiding architecture and benchmarks too.
As daunting as the hardware architecture is I think the software problem is worse. I think that coherent caches do not suit most of the dwarfs. Current multi-threading with locks and coherent caches is very error prone and very inefficient as a way to move data. I know no practical solution except languages which do not call out order of computation. There are damn few of these and I know none well. I doubt that languages with assignment can fill this role unless there is principled means of limiting the scope of the variables. Of these languages only Sisal, as far as I know, was moderately portable and tried to match some particular hardware systems. Each new hardware architecture would require new compiler strategies. Sisal made many of the right claims but I can’t vouch for it. It seems to have been 20 years ahead of its time. The need may be so pressing now that it (or its ideas) must be revived. Even more daunting it may be necessary to remove the concept of address from out computing languages. This eviscerates just about all of the familiar and conventional tools for ordering calculations within a computer. Haskell does type inferencing which avoids the interpretive overhead of constant type checking; it also eschews assignments. Haskell seems devoted to lazy evaluation and garbage collection. This may be a substantial penalty for most of the dwarfs. I don’t know how deeply these presumptions are built into the language. A critical language design issue here is whether the language can sensibly allow the programmer to provide information that guides the map to the hardware. Modern languages often provide semantically void pragmas that are performance hints to the compiler. This might be extended to include mapping pragmas.
Patterson suggested, and I agree, that it will often be necessary to instrument the code to dynamically learn about the real shape of the computation. In many applications the shape is unknown and learning the shape may indeed be the point of the computation. The current convenient division between static and dynamic, or between compile time and run time, may be doomed. One particular case of this is with adaptive mesh refinement where the topology of the mesh changes as the program proceeds. This will often require remapping the calculation onto the hardware. Perhaps this action can be triggered by the logic that adapts. The same phenomenon strikes in Unstructured Grids where topology changes are typically gradual and more local and will usually not require remapping. Here is a physics program using an unstructured grid. Perhaps monitoring data flow and processor load can trigger remaps.