Here are some collected notes on keepers, but see, perhaps, a slight rant on the subject first.
The idea of a keeper is loosely that of a routine that gets control when something unusual happens.
There are many instances of this idea in operating systems.
-
Sometimes the hardware detects an exception such as an invalid address and the operating system reacts in a prearranged way to return control to the user who may be able to do something useful and perhaps even continue the calculation, perhaps having made the address valid.
- Most CPU architectures specify a variety of other exceptions, detected by the hardware and reported to the privileged code.
One of the advances in Keykos was to remove from the shared kernel, most of the definition of the response to such events.
Two benefits resulted:
- The TCB shrunk, (The function moved to user replaceable code.
Your fancy response code was not in my TCB.)
- The user had the opportunity to define custom responses.
Conventional systems may provide useful user selectable responses to these events from a fixed catalog but these scarcely exhaust the possible uses to which such signals can be profitably put, and the semantics of these responses burdens security arguments.
Most conventional systems will give control to a program in the address space of the program that caused the fault with enough information to repair the fault.
In most operating systems a user program can request notification of the exceptions that it causes but there is no way for one program to request notification of access to invalid portions of some particular memory object, or to be notified when another program exceeds some resources.
These limitations preclude a number of tactics.
The original hardware exception was often designed to present a virtual construct to the program, such as virtual memory or virtual continuous operation (hiding the effects of time slicing).
The program is unaware of the operating system acting on exceptions in support of these virtual constructs.
The operating system hides evidence of the exception.
In Keykos the program that reacts to these exceptions is called a keeper and is typically hidden from the program that caused the exception just as the kernel is hidden from the user program.
The keeper thus resides outside the kernel but also outside the typical application.
Capabilities provides a natural answer to “What authority should the keeper have?”.
As a consequence the keeper I invent for my application is not in your TCB.
In Keykos the virtual memory of a program is composed of segments.
Each segment may have its own keeper.
A keeper for a segment is installed by the creator of the segment.
If I create a segment and grant you access to that segment, then it is my keeper that will react to memory faults within that segment caused by your program.
I am in a position to create data within the segment upon your referencing some new portion of the segment providing the illusion that the data was already there.
You may be concerned that my keeper not have access beyond that necessary to provide the missing data.
In particular I should not have access to your authority or even be able to read your address space, except for my segment that appears there.
Remote Segments
Other custom keeper functions can be provided as well.
Keepers have been designed that provide the illusion that a mutable segment is resident simultaneously on remote machines.
This requires that respective keepers on the machines can communicate.
A particular page of the segment is in one of these states:
- read-write on one machine and invalid on the other
- read-only on both machines at once.
Memory faults detected by the hardware cause a message to the keeper who causes a state transition for a given page, perhaps preceded by transmission of the page content.
This is the MESI protocol used by many microprocessors to coordinate the caches of multi processor systems.
Other notes bearing on keepers in Keykos:
- My Keykos architecture paper has a section largely organized around keepers.
It uses jargon and concepts introduced earlier in the paper, however.
- How segment keepers may measure access to data.
- A note on meter keepers and implementing an external scheduling policy.
- A general description of Keykos segments mentions keepers frequently.
- The expired patent describes keepers extensively with a patentese perspective.
The keepers in Keykos conform to the following pattern:
- Some kept object implements a conventional function
- There are two capabilities to that object:
- The capability to use the conventional function
- The capability to service the object
- The implementation of the object includes a capability to the keeper, whose behavior is defined by user code.
- Upon an exception, detected by the code that defines the object’s behavior, a message is sent to the keeper that includes:
- The service capability to the object
- A capability to restart the process that caused the exception.
- Some data indicating the nature of the exception.
- The keeper typically has some authority of its own.
At least it has authority to consume some resources to do the work.
- The service capability typically provides the keeper with access to those more primitive objects of which the kept object is built.
It is with this access that the keeper does its work.
We must inquire why some functions are in the kept object and others are in the keeper.
Here are some discriminants:
- Some correctness and security properties can stem from the type of the kept object regardless of the nature of the keeper.
Thus varying the keeper’s type while holding the type of the kept object constant, combines some advantages of polymorphism and security thru known types.
The nature of the message to the keeper is part of the spec of the kept object.
This allows for security arguments.
- An example is the non-prompt space bank whose keeper is invoked upon storage exhaustion.
The bank service key in this case is merely the normal bank key.
The keeper’s job is to free up storage via capabilities it must already hold.
- The kept object may be built from dangerous primitives that cannot be made generally available.
Part of the duty of the code defining the behavior of the kept object is avoid damage to others while using those primitives.
In the cases that come to mind the kept object multiplexes some shared resource, and the keeper customizes the behavior.
Here is a brief note on domain keepers.
Core Ideas of Keepers
I don’t know how many separate ideas or patterns there are.
Here are some:
- An object that you can trust even though some part of its behavior is defined by untrusted code.
There are two orthogonal questions here: Is the untrusted code confined? Is it isolated?
- To minimize code with excessive authority.
This is the case with Keykos segment keepers.
Code in the kernel has infinite authority and needs to be minimized.
Even when the removed code is universally used, this plan diminishes the number of ways that things can go wrong.
Composition
A dirty little secret about keepers is that they don’t compose.
If two keepers exist for the same construct, it is not generally possible to compose them so as to have their combined benefits.
Most often it is not possible to be clear what the combined specifications would be.
Normally in combining advantages human judgment is necessary to allow for issues of precedence
or ordering of effects.
This effect manifests itself in Keykos in that the service keys are not meant to be shared.
Nor is it feasible to design them so that they may be shared by programs that are unaware of each other.
Scheduling
Segment keepers and meter keepers have a problem with concurrent users of the kept object.
A meter keeper is called when a meter runs dry.
There may be several processes running under that same meter and it is possible that another of these processes will try to run before the keeper has replenished the meter.
This is in fact likely if the keeper delays as a scheduling policy.
Subsequent users of the meter will accumulate on the keeper’s stall queue.
When the keeper replenishes the meter and restarts the originally trapped domain, the kernel will note that the keeper has become available and put just the first member of the stall queue on the CPU queue.
Two of the scheduled domains will thus promptly resume operation but other queue members will await service from the worry queue server.
This makes scheduling thru meters unduly sluggish.
It has been suggested that a kernel tool be available to move all of a stall queue to the CPU queue.
This would require the domain service key or perhaps just a gate key to the domain with the queue.
If several users of a kept segment touch the same page before the segment keeper has had time to reify that page, the same problem arises.
In practice the problem is much less likely for the keeper probably has no reason to stall.
The keeper is likely to remove the obstacle by placing a key in a memory tree and become available by returning to the trapped domain.
The keeper thus becomes promptly available to serve other faults.
This means that one segment keeper domain can drive concurrent I/O operations.
If the keeper removes the page by computing segment content contents, the problem with the meter keeper arises.
The same proposed solution should serve in this case as well.
See this about some issues about extending the keeper pattern to objects defined outside the kernel.
The keeper pattern.