Optimizing the Kernel

Kernel Integrity

The kernel has very simple semantics. The kernel’s size (about 100 KB on the 370) is much larger than the semantics would suggest. The main source of discrepancy is that there is much optimization within the kernel to achieve efficient use of:

CPU
Channels
Disks
RAM

These optimizations are generally of the form of adapted data structures that reflect the logical state of the world, but take a form to make state consultation and mutation more efficient. Sometimes these data structures replace the logical state, and sometimes they merely duplicate the logical state. The factory patent describes many details of these optimizations.

Bugs in such code can lead to security breaches. Typically this requires the penetrator to be aware of and understand the nature of the bug in some detail. The bug’s behavior is probably non deterministic. Nonetheless bugs have been seen and fixed that would have allowed exploits.

There is no division between “semantics code” and “performance code”; they are as mixed as an omelet. There is reason to believe, however, that the efficient kernel is scarcely more vulnerable than an unoptimized version.

There is yet more code, called “CHECK” that checks each of the optimizations of the kernel, which are each carefully documented. CHECK is run before each checkpoint and even more often shortly after deploying a new version of the kernel. Virtually any error in optimization logic is vulnerable to being caught by CHECK if CHECK runs soon enough after the error.

There are, however, many dynamic optimizations of disk hardware that are not subject to such discipline. A combination of hardware storage protection and CHECK tends to detect blatant storage corruption by this code. This optimized code is trusted with such sensitive matters as page identity, however.

This is also a reason to be optimistic about robustness of an open source version of the kernel. Modifications to CHECK are a red flag warning of any short cuts taken regarding the optimized data structures.

A feature that we did not implement would be kernel primitives, the kernel irritation key, to drive the kernel thru routines normally invoked for reasons of performance. It would be reasonable for user mode diagnostic code to be able to cause the unpreparing of a node, an act with no semantic impact unless there are bugs in that code. Such hooks would facilitate driving the kernel into corner cases. They might be either closely held or rescindable in order that diagnostic code could not impact performance in normal use.

I think that the kernel has never had buffer overrun problems, for few strings are sent to the kernel and kernel behavior is specified for strings that are too long. The MVCL command is used by the kernel to fetch and pad the message from user code. It requires a length for the destination. There is no coding convenience in specifying any length other than that of the storage allocated to receive the data. The 370 machine code has no command like strcpy. Earlier IBM computers did.