Sorting as an Early and Modern(?) Software Paradigm

John von Neumann showed how to sort n records on magnetic tape in n log(n) steps and log(n) tape passes, using three tape drives. This was a big deal circa 1950 when a big machine had perhaps 1000 words of memory.

I recall the logic of an assembler predicated on the notion that the symbol table was much too big to fit in memory at once. The first pass was to transcribe cards, one code line per card, to tape if they were not already there. A pass would assign an absolute memory address to each instruction. This pass produced a second tape file that included the addresses and original line of assembler code. Another file was produced, perhaps on the same pass, where a line of code that referred to n symbols was replicated n times, once with each symbol, in turn, appearing at the beginning of the line. That file was sorted. This brought the definition and usage of any given symbol into proximity. A pass over the sorted file would transfer the known value of each symbol to the code lines that required them. Then the output of that pass was resorted into original program order. At this point the redundant code lines were discarded after extracting the symbol values they contained. An assembler listing and binary cards were also produced on this last pass.

A bit later I heard a description of a Cobol compiler that ran on a machine without much more than 1000 words. The talk described many more passes than the assembler, yet in some detail. The talk was persuasive that this was feasible. I recall thinking that this sort of discipline might make compiler design easier, but I do not now remember why. Perhaps it was merely the excellent design of the particular compiler that led to this strange conclusion.

Rather later the 1401 Fortran compiler which ran in 8K characters of core memory proved again that such compilers were possible. There the program material stayed in core and 63 very small sequential code passes came thru memory and worked on a phase of the compilation.

Today (2009) the CPU performance is faster than RAM latency by a factor sufficient to harken back to these ancient ideas. Today I am trying to organize a computation that requires modeling cause and effect traveling between elements of the simulation. I shall consider the sorting paradigm.