This code was written to compute xy mod m for large integers x, y & m. This is a high level description of the code that specifies some internal interfaces and these are some notes that I wrote to myself to guide the design. Here is how to call it.
The little Java program does one of three tricks:
• java -cp . zz big n
chooses an n bit test case and writes a small file to be included when mp.c is compiled. This provides C definitions of some big numbers to challenge the modPow routine as a test. The answer is included. The main routine that includes the “#include xx.h” statement can be enabled to use this file. In a Unix command shell this can be used thus:
```javac zz.java
java -cp . zz big 1492 > xx.h
gcc -O3 mp.c
time ./a.out
```
The java program, zz, will choose a test case of 1492 bits, file xx.h will be created and included in the source of the C program which prints the result and “Goodness!” if it computes the provided answer.
(2009 Apr 13: n = 14920, CPU = 1.83 GHz Intel Core Duo => 120.77s user)
(2010 Sept 30: n = 14920, CPU = 1.83 GHz Intel Core Duo => 91.02s user)
(ditto with clang, CPU = 1.83 GHz Intel Core Duo => 80.87s user)
(2011 Jan 29: n = 14920, CPU = 1.83 GHz Intel Core Duo => 91.02s user)
(ditto with clang, CPU = 1.83 GHz Intel Core Duo => 80.88s user)
(clang, n = 10920, CPU = 1.83 GHz Intel Core Duo => 30.91s user)
(gcc, n = 10920, CPU = 2.4 GHz Intel Core 2 Duo => 6.43s user)
(clang, n = 10920, CPU = 2.4 GHz Intel Core 2 Duo => 6.40s user)
(2014 June clang, n = 10920, CPU = 2.4 GHz Intel Core 2 Duo => 6.07s user)
• java -cp . zz prime n
is like the above except that the modulus is chosen as prime and the exponent is the modulus − 1. The answer is always 1, which makes a good test.
• java -cp . zz time n
is like java -cp . zz big n except that the Java method modPow is itself used 20 times to compute the answer. This allows relative timing.
Here is a nice collection of tricks that I didn’t use.

I wrote the above code about 2003. Today (2014) I would try the mulq command and use perhaps 62 bits per long word instead of 30 bits per word.