The above works on recent Macs. (on Max OS X 10.7.5 with a 64 bit x86) On a different OS it was necessary to remove the first line and each of the 6 underscores. Suitable prototypes for the routines are:
typedef unsigned long int ul; typedef struct{ul q; ul r;} res; res divq(res, ul); res mulq(ul, ul); res addq(res, res);
mulq multiplies two unsigned 64 bit integers and returns a 128 bit integer.
divq divides a 128 bit number by a 64 bit number returning a 64 bit quotient and remainder to fully exploit divq.
In such a context (res){j, k} denotes j∙264 + k.
The “q” in “divq” is a GAS (Gnu Assembler) convention to indicate 64 bit operands.
Warning: divq is fairly slow. On a 2.4 GHz Intel Core 2 Duo, divq takes about 22 ns.
clang m.c q.s ./a.outproduces
6700417 3 32820881070691349 462887844924339 2559402536 11508347135197814379 11000000000001 1000000000000000which is right.
lsq below serves as a 128 bit left shift and llsrs as long long signed right shift:
typedef long int L; res llsrs(res v, int n){return n<64?(res){(L)v.q>>n, (n?(v.q<<(64-n)):0)|(v.r>>n)} :(res){(L)v.q>>63, (L)v.q>>(n-64)};} res lsq(res v, int n){return n<64?(res){(v.q<<n)|(n?(v.r>>64-n):0),v.r<<n} :(res){v.r<<(n-64), 0};}
This is the program that motivated these routines. It computes e and converts it to decimal to many digits. C does not expose the power of the divq command which is exploited here for a particularly simple calculation. Before hardware floating point computers relied on commands such as these for scaled fixed point arithmetic. They still have their uses.
divq divides 128 bit value in RDX:RAX by named divisor and puts quotient in %rax and remainder in %rdx. See Vol. 2A 3-221 of Intel x86 manual: “Intel® 64 and IA-32 Architectures Software Developer’s Manual”
This is useful: