Circa 1959

Many programs required keeping a numerical model of a simulated physics problem that was much larger than the core (RAM) memory. For a few thousand time steps all the model data would be passed over to compute the same data for the next time step. On the 701 and 704 magnetic tapes were fast enough and capacious enough to do the job. Typically the model was a 2D array of points in a mesh with 10 to 30 floating point numbers per mesh point. The obvious way was to put the model sequentially on mag tape, one tape block per row in the mesh. Two or three rows would be read in from tape which is enough to compute the first row of the next time step. Thereafter a row would be read, a row calculated and a row written until the next time step had been entirely calculated and written to tape. The two tapes would then be rewound and the process repeated with the roles of the two tapes reversed.

Rewinding tapes was not much faster than reading or writing them. Usually four tape drives were available and the model data would be split equally between two tape reels. After the first half of the array had been read from the first reel, that reel would be rewound and quickly thereafter when half of the next time step had been written out, the first output reel would be rewound. There was thus no waiting for rewinds.

With the advent of the IBM 709 this tape I/O was overlapped with computing. This provided the expected performance improvement but did not impact the tape logic described here.

About 1957 a new programmer at Livermore came up with the following clever scheme but I can’t imagine how he thought of this simple trick. He divided the mesh into 3 equal parts. Parts 1 and 2 were initially recorded on reel A which was then rewound. Part 3 was recorded on reel B which was not rewound. The calculation now commenced reading part 1 from reel A and writing onto the end of reel B. When part 1 of the array had been processed reel B was rewound and writing was switched to reel C. Part 2 of the mesh was read from reel A and written to reel C. When this was done reel B had finished rewinding and was ready to supply part 3, which was read and output went to reel C. When part 3 was finished reel C was rewound. At this point the next time step of the simulation was complete and part 1 was on the end of reel B, which had finished rewinding, while parts 2 and 3 were on reel C, which was rewinding. We now begin the next complete sweep of the mesh by reading part 1 from reel B and writing on reel A, when part 1 is done we rewind reel B and continue reading part 2 from reel C which has just finished rewinding. Part 2 is also written to reel A. When part 2 is done we rewind reel A and write part 3 to reel B. This is where we came in. We have overlapped tape rewind with the rest of the tasks with only three tape drives and reels.

Another tactic was to read tape backwards. The 701 and the Univac could read tape backwards but later machines lost the ability. (The 360s could read tape backwards but we did not use them at Livermore.) The memory of the Univac was so small (1000 words) that not even a whole row could be read in. Typically a 3 by 3 array of points would be accommodated in memory (Mercury delay lines) at once. The Univac had 10 tape drives which was enough to do early 2D meshes.