About this code

Αfter α in the code we have selected one of the ranges of the denominator which collectively cover all possible denominators. df is the lower bound of this range. df determines the magic value N. nl will be a list of bounds for r, one element for each stage but the first element is the number of extra subtracts that are conditionally necessary at the end to get the last quotient bits needed.

The named let rlp steps over each division iteration calculating a bound for r and passing that onto the next iteration. The initial remainder is at most twice Nd. The variable ll holds a growing list of quotient piece limits, one per iteration. Keeping those in limits is what makes this work.

For each iteration we are given an upper bound as r and must compute the new bound: r ← f(r) where f(x) = 2^Mx − Nd(floor(2⁻⁵⁴x)). The input range for r may be divided into disjoint ranges [n2⁵⁴, (n+1)2⁻⁵⁴ − 1] for 2^M+z ≤ n < 2^M+z+1. In such a range f is monotonic increasing. From one range to the next x increases by 2⁵⁴ and f increases by 2^54+M − Nd which we taken precautions to make positive. The maximum value for f(x) in the total range will be either for x within the last partial range, or at the end of the previous full range. For an adequate bound we compute both of these function values and take the max as the next bound for r.

For interpreting the output (at the bottom of the code) the integer 422 appears once. The next to last iteration will receive an r such that 0≤r≤422 when we are computing 7 bits per iteration and looking at 7+1 bits in the remainder to choose the next 7 quotient bits. In that case we may have to subtract 3Nd from the last remainder and add 3 to the quotient for the ultimate quotient. We see that surveying the enumerated cases we generally need just M+2 multiplier bits at each iteration.

Scheme Scraps

With enough Scheme lore you can decipher my programs. Sorry. Here are a few scraps:

To evaluate (α β γ) require that α be a function of two arguments and pass β and γ to α. The whole expression yields what ever α returns.
The function ls is ‘arithmetic-shift’ or really left shift as in the sense of binary integers. (ls 3 2) is 12. For negative shift amounts shift right and loose the low bits. (ls 33 -2) is 8.
(p2 n) is 2ⁿ.
(Or m n) is the binary integer whose bits are the or of the the respective bits of m and n. (Or 5 12) is 13.
(÷ m n) divides m by n and gives the integer quotient, rounding down if necessary. (÷ m n) means what m/n means for integers in C.
(λ (x y) β) is what Alonzo Church would have written λxy.β in his λ calculus. It evaluates to a function of two arguments and presumably parameters x and y occur freely in β. Scheme normally uses ‘lambda’ in place of ‘λ’.
(let ((x 3) (y 6)) (+ x y)) yields 9.
So does (let name ((x 3) (y 6)) (+ x y)) but ‘name’ is now allowed within the context of (+ x y) to name a function of two arguments.

Why Scheme

My Scheme programs have fewer bugs than for any other language, other things being constant.
Most Schemes provide unbounded integers. (The one I use does.) This program has values > 2⁶⁴.
There is a little variation among Scheme systems. See this.

Some Miscellaneous Clues

For each discrete value of N, nq and rb, the variable r is a monotonic function of previous r’s or of nf and df. To determine the maximum value for the last r (when rb<0) we consider each discrete value for N and then the sequence of 53÷M values for nq and r, each time assuming worst case for nq.

The code has two nested loops, both employing Scheme’s named let pattern. The first loop, dlp, runs thru denominator ranges. For each iteration the integer j is taken from the high 9 (= M+s−1) bits of the denominator. Since the high bit is always on there are 2^M+s of these cases.

The second loop, rlp, goes thru the same stages as the division algorithm. The inner loop assumes that the C expression r>>54 produces the largest value for nq that the previous max on r allows. That, in turn produces the largest r for the next iteration.

First we know the denominator, df; we iterate over 2¹⁰ (= 2^M+z) values ranges for d. For such a range there is some value of j such that 2¹⁰≤j<2¹¹ and j∙2⁴²≤ d < (j+1)∙2⁴². For this range we compute a specific N = 2²⁰/(j+1) and then Nd.

Having fixed j and a lower bound for Nd, we consider a sequence of 53÷M upper bounds for r, which is initially Nn, each range corresponding to a value for nq that we extract from the remainder each iteration of the division loop. The first ‘remainder’ is Nn which cannot be more than twice Nd.

This variation models using carry save add between the division iterations to save time. The right most y bits of the remainder are carried redundantly and this makes those bits of r inaccessible for nq. That means we must consider the worst case and add 2^y to the bound. It is easy to modify our code to see the impact on the upper bounds of r that we provide. A new parameter, y, indicates how many bits are carried redundantly. Each iteration requires an ordinary M bit full adder to materialize these bits from the CSA output of the previous iteration, but without CSA the alternative is one 62 bit full adder each iteration.

With y=3 I see no change. This code needs close study!