We compare fractions of operands and subtract operand exponents to determine ultimate exponent. Also in that cycle we consider pulling the emergency cord due to sub-normal denominator or foreseen sub-normal quotient. Call that clock 0.
We compute Nd and 3Nd. Nd requires 10 logic levels plus a full add. 3Nd also requires a full add. 3Nd is delayed beyond Nd by 4 logic levels. Nf is needed soon too. That might all be squeezed into 3 clocks but 4 is more likely.
Now we iterate the following 7 times serially. Between iterations we have the 63 bit r expressed as 11 high plain bits plus 52 low bits of CSA. q is carried somehow and rb is coded in some convenient way. Nd and 3Nd are carried along unmodified. We consult high bits of r to get nq. We code the 10 bit nq according to this plan into one multiplier bit and 3 nibbles. This results in 4 augends and 2 r’s from the previous iteration. We also accumulate nq into q which causes a potential carry which we may have prepared for in the previous cycle by producing a q assuming a carry and another version assuming no carry. This feels doable in a clock.
There is not much to do at the end. This is optimistically a 12 clock operation.