The code described below puts an 8 bit field element in some 64 bit word. A better approach, I think, is to put such a field value in eight consecutive words at the same bit position in the respective eight words. Eight words then hold 64 field values and 64 bit XOR instructions may be used to more efficiently do field arithmetic. Portions would then necessarily have a size whose size was a multiple of eight. Perhaps I will write some different code but not today.

This note is about CPU design, code optimizers and tangentially about GF(28) or the finite field with 256 elements. This mathematical structure is used heavily in Reed-Solomon codes (Erasure coding), and lightly, possibly in Shamir secret sharing. Being a field, and a computational field at that, there are add, multiply and divide operations to be provided. The theory is elsewhere but we examine the programming of multiply here which is usually the bulk of the necessary heavy computing. The common header file defines a few types and provides a short fast definition of multiplication. It is in a header file so as to be available for inlining by clients. Here is the core:

typedef unsigned char uc;
typedef unsigned int ui;
static const uc irp = 27;
static ui fmul(char a, ui b){uc sm=0;
  while(b) {if(b&1) sm ^= a; a = (a<<1)^((a>>7)&irp); b>>=1;}
  return sm;}
clang -Wmost null.c -S -O3
produces a file null.s which shows how clang compiles to your native instruction set. Here is null.s for Appleā€™s clang-500.2.79 for the x86. On Unix
gcc -Wall null.c -S -O3
produces a file null.s for gcc 4.4.1 (x86_64-redhat-linux).
clang tmul.c Afmul.c -O3 -Wmost
or
gcc tmul.c Afmul.c -O3 -Wall -std=c99
verifies field properties of the algebra.

Substituting Bfmul.c for Afmul.c tests a system where we precalculate 768 bytes of tables.

This computes Lagrange interpolation coefficients and moves towards code to restore lost portions.