Here is some primitive code to read AIFF files which normally have extension .aiff .

The compression type sowt is explained here.

Now I have included my fft code for the discrete Fourier transform. This version, which I tentatively freeze, grabs the left channel for 216 samples and reports peaks in the transform. The sample is from near the beginning of the piece. The duration is about 1.4861 seconds (65536/44100). The transform sample frequency is F = 44100/65536 Hz = 0.67291 Hz.

I think out loud here about how to interpret the transform. Naïvely we expect to see frequency spikes in the transform. We do but some cogitation here helps to interpret them. If the input is only a pure note for the duration of the sample and if there is an exact number, say k, of cycles of that note in the sample then the transform output array would have just two non-zero answers, one at index k and the other at index 216−k. In this case the frequency of the pure note = kF. There are these two because the imaginary component of the input was unavailable and left as zero. The complex value of these non-zero output values are complex numbers whose phase depends on the phase of the pure note at the ends of the sampling interval. We ignore this phase for now.

When a pure note were not a multiple of the sampling interval it is confusing but it turns out that a note of frequency fK, yields sharp peaks near k and 216−k in the array. Nearby entries become zero rapidly as you move away from k.

Using the CD: CBS M2K 42269, “Glenn Gould, Bach(JS)/ Toccatas & Inventions [Disc 1]”. I open the disk in the finder and drag the second ‘file’ onto the desk top. This is the Toccata In C Minor, BWV 991. There I found the bits that led to this program.

There is a peak at 586 and 587; I would judge 586.6 . 586.6 F = 394.73 Hz. If A were 440 Hz then the G below would be 392.

I use the last track now: the Toccata in E Minor BWV 914. There is a narrow peak at 491 which is a frequency of 330.4 Hz. If A is 440 then E below is 329.6. Perhaps Gould’s piano is tuned a bit sharp.

Computing the convolution between the channels shows that one channel is about 11/44100 seconds ahead of the other. Here is a convolution demo.


general sound file formats
Draft AIFF-C Specs,
more info