We gather here some ideas about multiplexing hardware among a set of contending users. Multiplexing, here at least, means allocating and rapidly reallocating hardware components to those ‘users’ who need the function of the hardware. I draw examples mainly from the Keykos kernel where there are several sorts of multiplexed ‘units’ or discrete hardware elements to be ‘timeshared’. Another source of ideas are the tactics of virtual machine monitors to timeshare real and synthetic machine components to provide the functional illusion of several complete systems.
In this note “kernel” refers to that code written to accomplish this multiplexing. The Keykos kernel mainly does this multiplexing plus a few other vital functions.
Perhaps the most prominent sort of shared unit is the CPU. ‘Timeslicing’ usually refers to switching the CPU between tasks who know nothing of each other and whose design is oblivious to multiplexing. Like other units, the CPU has state which is logically attached to the user but must be bound to the unit while the unit serves the purposes of the user. As the unit serves a user, this state is typically modified and when it comes time to reallocate the unit to another user the state must be preserved by the kernel for later resumption. Commonly software, provided by the user and obeyed by a CPU, manipulates the state of the units. Commonly at most one unit is allocated to a user at an instant. In that sense a user here corresponds to what Unix calls a ‘thread’. All this is familiar to anyone who has pondered kernel design and these outlines apply as well to units other than CPUs.
We mention another sort of multiplexing which has not been implemented to my knowledge but exemplifies another tactic and motive. Most newer processors segregate floating point registers and the cost of saving this floating state burdens even programs who do not use floating commands. This state can be large in vector architectures and the cost of saving and restoring significant. Some modern processors include a privileged hardware bit which the kernel may use to deny to user code, access to the floating registers, and thus also the floating point functional units. This bit is in the PSW and there is thus no extra cost in loading it. We explore this idea here in the Keykos context. We imagine an implementation where two domains (e.g. routine and subroutine) can be configured to share a floating state, not because such sharing is strategic, but because it is easier to allow it than to disallow it. The motive here is to avoid the cost of saving and restoring floating state. In general the kernel semantics involve introducing a ‘floating state’ construct to be referred to in the construct that defines the user. The internal kernel constructs are merely a cell per processor identifying the domain, if any, to whom the real floating state belongs. Floating state is loaded only upon execution of a floating command by a user whose state was saved due to intervening use of floating commands by another user. Floating point state is saved only upon need by another user for floating functions.
The IBM 370 system includes PER (Program Event Recording) hardware, bound to a CPU, which is useful for debugging. It may be used to efficiently implement debugger functions commonly called ‘breakpoints’ and ‘watchpoints’. There are privileged PER registers that the debugger sets that parameterize the PER hardware for the code to be debugged. On the 370 there is a PSW bit that enables PER. When disabled the registers preserve their state but have no effect on the running code. When enabled they will cause interrupts depending on program logic and content of PER registers. This pattern exactly fits the floating point state strategy above. If one user of the machine is using PER then the PER registers will contain his values and no saving or restoring will occur. This is exactly like the proposed floating case described above. Only the one user will have a PSW with the PER bit therein set to one. This PSW value will reside in the real PSW while the CPU is allocated to that user, and saved along with the standard CPU state at other times. I think you will see that there is 0 overhead in this common case. As two users contend for PER the only overhead is when one user gets a batch of CPU time slices not interleaved with time slices of the competing user. It is one hardware reallocation per batch of slices. The Keykos kernel did exactly this on the 370. See ‘PEROWNER’ in the 370 Keykos kernel logic manual. In use, the PER state did not change which meant that saving PER state was unneeded.
Another nearly universal kernel function is allocation of RAM page frames to hold the pages known by users. In Keykos pages are part of segments which are what users used, but no difference. Pages are brought to memory upon reference. Keykos segment logic often leads to sharing of pages between users.
IBM's VM/370 would multiplex real disks by allocating contiguous cylinders (a ‘mini disk’) to users on a long term basis, and allocating the IO channel and disk controller on a short term basis. The virtual disk had fewer cylinders than real disks but IBM disk users were already well advised not to assume how many cylinders there were on a disk. There were software tools valuable for the operator to move a mini-disk from one real disk-pack to another.
VM/370 also provided virtual card readers, even if the real machine lacked them. There were also virtual card decks, produced by virtual card punches or real card readers. Such readers, punches and decks were synthetic and there was no logic nor need to multiplex them. The operator could allocate an arbitrary IO device to a virtual machine and the 370 channel architecture assured that the kernel need not understand what sort of device it was. There are standards for Firewire, USB and other standard interconnects that should do the same for those worlds. I don’t know whether devices conform to test standards.