This note concerns principles of kernel code which avoid a large class of situations where conventional kernels crash or sacrifice blameless applications.
These principles allow progress of blameless applications in the same machine as abberant apps that hog resources; they avoid gridlock and protect against Denial of Service.
It has occurred to me that whenever Unix
transmits data (and perhaps signals?) from one UMP (user mode program)
to another it buffers the signal, whereas in Keykos it never does.
The UMP in Unix is the process and in Keykos is the domain.
There are several reasons for and consequences of these design decisions.
Keykos banished buffers from the kernel because they greatly complicated
checkpoint design and introduced the problem of accounting for buffers
without consumers.
In Keykos the kernel is never in the position of guarding orphans.
See Billed Buffers for Unix.
Unix has exploited byte streams at the foundations of the system, thus
simplifying many tasks.
The producer and consumer of the stream need not agree on where natural boundaries in the stream lie.
This aids the design of programs that deal with communication networks.
Another ramification of buffering is that it seems to imply that the
scheduler is involved in each transfer of data, granted that the producer
can put several batches of data into the buffer without involving the scheduler.
Still round trips of cause and effect involve the scheduler.
Billed Buffers for Unix: Perhaps Unix could
account for orphan buffers by billing them to the consumer.
I don’t think that you can solve the various accountability problems without spelling this out in the architecture and I have seen no such notes.
Perhaps it is implicit in the specs.
Keykos is committed to never running out of RAM.
As I understand it the Unix kernel cannot say no to a buffer allocation request short of crashing.
Here are some Keykos kernel desiderata that I think are recorded somewhere:
- Don’t paint yourself into a corner where you cannot proceed without allocating more buffer space but there is no more space and nothing that can be swapped out to disk.
Note that the statistical fact that there is usually something that can be swapped out does not handle the scenario of a producer of an infinite stream whose consumer does not consume.
Eventually the kernel must cease to run the producer.
Even then the question arises of the producer resuming under an assumed name.
Must the kernel refuse all requests for buffers until the original reluctant consumer has consumed?
Here is a sequence of progressively more serious challenges to the Unix kernel’s allocation principles
- Establish a pipe between two processes where the producer produces but the consumer does no reads.
I presume that this results in a buffer of some size and then the producer is no longer scheduled.
-
Put the former operation in a loop, not deleting the prior consumers and producers.
The point is that user code can produce an unbounded amount of data that the kernel has no way of removing from RAM.
I can imagine no triage that the kernel might do that would distinguish between malevolent patterns, such as the above, from legitimate patterns.
Any such method of distinguishing must be documented so that the application designer can comply.
It might not have occurred to me that this problem was solvable had I not studied how CPU hardware worked.
When a process comes to a store instruction and the address is unmapped, most processors pretend that the store instruction had not been started and execute a trap instead to the kernel fault logic.
Some processors instead put the invalid address and the word to be stored in two special registers and trap to the fault logic.
This avoids the logical hair of nullifying the side-effects of the store or subsequent instructions that may have finished while the relatively slow memory mapping logic was discovering that the address was invalid.
The 88K Keykos kernel merely includes these two words in the domain state and blocks the domain.
When and if the address becomes valid for this domain the kernel can interpretively perform the store and allow to domain to resume.
Another case is management of the RAM caches.
The hardware is never at a loss for a cache line to discard.
Cache entries all have home positions in slow RAM to which they can always be promptly and safely sent.
Some further thoughts on this.