Here is multi threaded code to do the 2 dimensional fast Fourier discrete transform, using more than one processor.
`sz` is the size of a side of the array; it must be a power of 2.

The code can employ `cpus` processors.
If there are fewer processors then things still work and the performance gain is nearly as much as if `cpus` had been an accurate cpu count.

`a` is the 2D array.

`wave` is a switch: wave=0 for spike input; wave=1 for plane wave.

`cexp` (the library complex exponential) is used in main to intialize the grid to a plane wave.

`fft` is the routine introduced here but for floats instead of double values.

The routine `psam` just prints a small select rectangle from the mesh.

`d2Dft` causes the transform of the values in `a` to replace those values.

`fp(th, m)` finds and reports the values in `a` with the greatest magnitude.

The first transform leaves one peak which reflects the mono frequency initial wave.
The peak is spread a bit because the wave was not periodic; it is discontinuous at the boundary.
After two transforms we have the mesh reversed about the center.
After four transforms we are back to the initial values.

I compile on the Mac with `clang ft.c fft.c -Wall -O3` and run with `./a.out`.

On some Linux boxes “`gcc ft.c -lm -Wall fft.c -pthread -std=c99`” works.

### Nascent Thread Nexus

Demo of __sync_fetch_and_add;
thread stuff