Entropy Seeker

The idea is to efficiently search a long string of computer memory for a short block of high entropy. Such a block would have a nearly flat histogram of databyte values. If h[j] (for 0≤j<256) is the histogram for such a block, then the expression E = −∑_jh[j]² will be small only when the distribution is flatish.

To search much memory efficiently for such blocks we observe that when we have a histogram for 1000 bytes starting at address p it can be quickly modified to become a histogram for the 1000 bytes starting at p+1 with about a dozen instructions. Furthermore the value for E is easily updated as well taking advantage of the fact that ∑_j<kj is k(k−1)/2 which is a surrogate for k².

Here is some code to do just this. This short code requires more explanation than usual.
Routine uc * se(uc * beg, int cc) looks for a block of high entropy starting at beg for cc bytes. It returns the address of the beginning of the first block that it finds. It returns 0 if it finds no such block.

The main loop proceeds over the area to be scanned with two cursors, fe (front end) and re (rear end) which remain at a constant distance from each other. The expressions “fe++” and “re++” provide and update the cursors. As a character is visited by one of these cursors an element of the histogram h is selected by “h+*fe++”. As the front cursor passes a byte the histogram element is incremented and as the rear cursor passes the same byte, the same element is decremented. A loop invariant is that E = ∑_jh[j](h[j]+1)/2. Another loop invariant is that h[j] is how many times byte j occurs between fe and re.