Stressing the Kernel

As we debugged the Keykos kernel we found the following simple idea vey useful in tracking down infrequent bugs, even before they struck in the wild. Set static compile time parameters to extremes and run. By its nature the Keykos kernel has several sorts of dynamically allocated multiplexed resources such as page frames. An early design decision was to allocate a fixed size pool of such fungible elements and always provide means to survive when they became over-allocated. The size those pools is a kernel compile time constant. Page frames were the prime example and by the time of Keykos such dynamic allocation of page frames was common-place. There were roughly 10 such sorts of resources. It was useful to know the logical minimum and this program helped us determine those as well. Minimizing the count of elements provoked the most bugs as routines to help tolerate exhaustion of these saw a much greater set of circumstances than in normal use. Maximizing the count showed another category as when we discovered that reclamation of resources was masking certain bugs and indeed such an event gave rise to this plan. A few bugs popped out when we set more than one compile time parameter to an extreme.

One of IBM’s own OSes crashed too often to be useful when it ran under VM/370. Of course people thought it was a bug in VM/370 and a considerable effort was required to track down the bug as it was very intermittent. It turned out that VM/370 was correct but provided such a large virtual TLB that it irritated a bug in the guest OS which sometimes failed to purge the TLB when necessary. It was later determined this bug had indeed struck in the field and those crashes had remained unsolved. Under VM/370 the bug struck often enough to find the bug. Perhaps this story inspired our plan; I do not recall.

See this for a related plan.