128 bit operands on x86

Here are 3 C callable routines in x86 Gnu assembler (GAS) to use x86 commands to process 128 bit numbers. Two commands are a variation of commands that Intel calls mul and div. GAS calls these variations “mulq” and “divq”.

The above works on recent Macs. (on Max OS X 10.7.5 with a 64 bit x86) On a different OS it was necessary to remove the first line and each of the 6 underscores. Suitable prototypes for the routines are:

typedef unsigned long int ul;
typedef struct{ul q; ul r;} res;
res divq(res, ul);
res mulq(ul, ul);
res addq(res, res);

mulq multiplies two unsigned 64 bit integers and returns a 128 bit integer. divq divides a 128 bit number by a 64 bit number returning a 64 bit quotient and remainder to fully exploit divq. In such a context (res){j, k} denotes j∙2⁶⁴ + k.
The “q” in “divq” is a GAS (Gnu Assembler) convention to indicate 64 bit operands.

Warning: divq is fairly slow. On a 2.4 GHz Intel Core 2 Duo, divq takes about 22 ns.

clang m.c q.s
./a.out

produces

6700417 3
32820881070691349 462887844924339
2559402536 11508347135197814379
11000000000001 1000000000000000

which is right.

lsq below serves as a 128 bit left shift and llsrs as long long signed right shift:

typedef long int L;
res llsrs(res v, int n){return n<64?(res){(L)v.q>>n, (n?(v.q<<(64-n)):0)|(v.r>>n)}
          :(res){(L)v.q>>63, (L)v.q>>(n-64)};}
res lsq(res v, int n){return n<64?(res){(v.q<<n)|(n?(v.r>>64-n):0),v.r<<n}
     :(res){v.r<<(n-64), 0};}

This is the program that motivated these routines. It computes e and converts it to decimal to many digits. C does not expose the power of the divq command which is exploited here for a particularly simple calculation. Before hardware floating point computers relied on commands such as these for scaled fixed point arithmetic. They still have their uses.

This takes 3 64 bit parameters and returns 2. “clang c.c -O3 -S” produces file “c.s” which reveals that a arrives in register %rdi, b in %rsi and d in %rdx. The yield goes to regs %rax and %rdx.

divq divides 128 bit value in RDX:RAX by named divisor and puts quotient in %rax and remainder in %rdx. See Vol. 2A 3-221 of Intel x86 manual: “Intel® 64 and IA-32 Architectures Software Developer’s Manual”

This is useful: