This is perhaps a simpler introduction.

In the days of the mini-computer, a decade before the micro computer, when an integrated circuit might have 100 gates, there were commercial integrated circuits designed around the notion of ‘register slice’. (There still are.) Such an IC would be for 4 bits of an ALU (Arithmetic Logic Unit) with one or two registers. Eight such chips would compose most of the data circuitry of a 32 bit ALU.

Let dj be the value of some 4 bits within slice j. The value of the corresponding 32 bit number in a gang of eight such ICs = Σj[0≤j<8]dj16j.

Addition of two 32 bit numbers was a significant chore for this architecture. Generation, propagation and assimilation of carries was a challenge. Here is a solution I remember. Each IC had an adder which saw 4 bits from each of the two inputs and provided 4 bits of sum. How to coordinate carries? Each adder produced two sums, one assuming carry in and the other assuming none. It also generated and exported:

Subscripts here start from the right. These two signals from each of the 8 ICs were gathered and processed to produce these 8 values for 0≤i<8: Ci served as the ‘carry in’ to slice i+1 where it was used internally to select one of the two previously computed four bit sums. C7 became the overflow bit. The carry in for slice 0 was 0 for add and 1 for subtract. Each subtrahend bit was complemented for subtract. Collectively these 8 selections form the 32 bit sum. See some corroboration. All of the whiles are to be unrolled before hardware implementation. All of the ifs become known after unrolling.

There are 56 solutions to 0≤i<j<k<8 and the circuit complexity for computing C is that many AND legs. The maximum fan-in is 6. I think one standard IC would compute all the C’s.

Strategic use for carry-in

One’s Complement

The above is for two’s complement which was the common case. For one’s complement add carries could wrap around and any contiguous set of P’s in this circular sense could affect the answer. If rot(a, n) returns the rotation of a by n bits, then:
rot(a⊕b, n) = rot(a, n) ⊕ rot(b, n)
where ⊕ is an add between one’s complement numbers.

The modified equation is:

where i, j and k range implicitly over [0, 7] and
ord(i, j, k) = i<j & j<k | j<k & k<i | k<i & i<j = ((i<j) + (j<k) + (k<i)) == 2.
This means that i, j and k are in cyclic order. The last expression exploits C’s notion of numeric booleans.

This corroborates the above modification. There are now 168 legs of ANDs.