I arrived at Livermore in 1955 as a programmer and an early assignment was to visit Philadelphia to learn about the logic design of the yet undelivered LARC. It was a fascinating experience.
The machine was decimal. The word held 12 digits. An instruction was formatted as TIIAABBMMMMM. II was the op-code. A register field in the instruction was two digits but our machine had just 26 general registers. AA selected the register operand and the contents of register BB were added to MMMMM to form the effective address. T might indicate a normal instruction, a traced instruction, or indirect addressing.
Our LARC had 12 non-interleaved core memory boxes each with 2500 words. An entire memory cycle for a box was 4μs. The machine ran on a global 4μs cycle divided into eight 500 ns slots. Each circuit in the machine would carry a new boolean value each 500 ns. Memory latency was 5 or 6 slots, I recall. The major units in the machine, except core boxes, were tasked according to the current slot’s identity. Thus the 8 slots on the memory bus were preallocated to these 8 distinct functions:
The 26 general purpose registers served as index registers, fixed registers and floating registers. They were hand wound cores with a one μs cycle time and one μs latency. Memory addresses 99900 thru 99925 referred to these registers. Use of these addresses incurred a 4μs penalty.
Unlike current RISC machines, there were few adders. The main adder, allocated according to the 8 slot schedule, would calculate an effective address on one slot, an instruction address on another, a fixed point add result or the mantissa of a floating add on yet another. The adder was not, however shared between processors, as was the case with the PPUs in the 6600.
The multiplier used a decimal version of carry-save add. The designer told me that the idea was already ancient. Wikipedia says that von Neumann invented this idea. Sequentially dependent floating adds would proceed at 4μs each. At most one instruction could be issued each 4μs. If an instruction modified an index register and the next instruction used that register in its effective address calculation then there was a 4μs penalty.
Given the above we can reconstruct a rough description of the degree of LARC pipe lining. Here is the schedule of events for one floating add instruction in a stream of sequentially dependent floating operations—the number at the left is the relative slot number:
Comparing Stretch and LARC strategies leads me to the following points regarding allocation of hardware units to logical tasks:
The memory addresses for the LARC registers were of the form 999XX. The assembler accepted symbolic values for register designations in instructions, both 2 digits and 5 digits. The assembler required that the three discarded digits in preparing a 2 digit field be 999. This had the beneficial result of producing an error when an address of a core word was accidentally used to name a register. I have needed and missed that warning on subsequent assemblers. One such missing warning caused more than 10 crashes in a production system.
The timing rules were easy to understand for the LARC and were indeed well understood two years before the machine was shipped. The timing rules for the Stretch were hard to understand and few understand them well. This understanding came about only a year after the machine was shipped. While the Stretch missed its speed goals (and the LARC didn’t), it was less late and was also a rather faster machine.
Both machines had random access mechanical memory (moving head disks and drums) and their performance characteristics was another saga.
Chuck Leith tells of the first application on the LARC.