This is to record details of the fast multiply for the decimal LARC computer and obvious extensions to other bases. The general scheme is called a Carry-save adder which was used to represent the intermediate sums in an otherwise straight forward multiply. The modern versions of carry-save add produce two numbers from three as performed by routine cook described here. For the purposes of multiply that scheme has displaced all others in modern multiply hardware. In that application the need is for speed at the expense of storage efficiency—a factor of two can be tolerated by enabling a two logic level add. The intermediate results for the LARC multiply used just one extra bit per digit, whereas modern machinery uses twice as many bits. The tradeoff is between combinatory logic levels required to perform an add, and bits required to keep an intermediate result between adds. There is thus an economy lesson to be learned here that may provide a strategic format for binary numbers where there is more need for storage efficiency than in the inner loop of a multiply. Most modern machines have floating point register files which could perhaps store operands in a somewhat bulkier format to speed the floating operations.

I propose a very specific format which I call redundant base R format. It is redundant for it uses one more bit per ‘digit’ than absolutely needed. The extra bit may be thought of as a resting place for carries racing leftward in mid calculation. One way to describe this, which we adopt on this page, is to say that each digit can take on any one of 2R values, even though the radix is R. If the digits are numbered from the right, ... d2, d1, d0, then the value of an integer expressed in this format is ΣidiRi. When R=2 the numerals 21 and 101 both denote five—thus the redundancy.

The crucial insight is that two numbers in this format can be summed to produce another such number in fewer logic levels than are required in the conventional binary representation. Perhaps as important is that two numbers, one in the redundant form, and the other in normal form, where every digit is less than the radix, can be summed to produce a redundant form with even less combinatoric logic. See routine adc here. The critical property of the code for adc is that the calculation within the loop does not need values produced by previous iterations of the loop. The loop iterations can occur in any order, or even in parallel! The latency of the naïve long carry chain has been largely avoided at the cost of extra storage and some extra control sequences and data paths to put the data in the ‘external form’ expected in the more conventional parts of the system.

Another strategic observation is that when the radix R is a power of 2 the awkward function md takes less logic, assuming binary representation of the digits.


Here is some more info with a math slant.
This file explores the task of adding two redundant forms. The resulting digits may be as large as 2R+1 in the obvious scheme. This is not good.