Modern processors that I have studied recently have modes that deny access to the floating state. In such a mode user instructions that attempt such access are cleanly trapped so as to grant access “at the last possible instruction” as it were. A general Keykos inclination to postpone work until absolutely necessary seems to win here.
This user access bit (UAB) may already reside in a privileged register that must be restored for other reasons upon transition to the user mode. This seems not to be true of the x86 but that makes the scheme only a bit slower. In the following I use “image” to refer to the process state kept in RAM while the process is not running. The image is loaded into various registers as the process starts and those parts that may be changed by execution of user mode instructions are preserved upon traps and interrupts of the process. The frequent kernel paths are coded as if there were no floating point hardware and the kernel does not itself use the floating point for its work. In particular the floating state is neither saved nor restored with the rest of the process image upon entry and exit from the kernel. When a domain is prepared its UAB image is set to deny access regardless of the existence of floating values in the domain’s image. (The image is that data that is loaded from RAM to privileged registers by the kernel in preparing to begin obeying the user mode instructions of the domain—typically a register with a name such as “process state”.) Some domains (depending on the particular hardware architecture) may lack storage space for floating point image. If a user instruction attempts to access floating state the kernel is entered and the domain is examined to see if it is equipped with a floating point image. The domain gets a fault code if it lacks such floating values. If it has a FP image then the floating state is prepared (to use a Keykos kernel term) by allocating real storage to hold the image in a form which is quick for saving and restoring the real floating point registers where the state lives during execution of the program. Of course the floating image may already be prepared. As with the domain’s DIB this image can always be immediately reclaimed as it merely reflects those floating values otherwise held in the swapped domain annex, just as a cache reflects RAM contents. After the floating image is prepared it remains to verify that the real hardware floating state for the processor is not already occupied by the only copy of some other domain’s state.
For each real processor with floating hardware, there is a real storage location LF that locates the storage area allocated to hold the image that is now in the real hardware floating point registers. In design X that area will belong to some particular domain that has most recently obeyed floating point commands on this processor. In design Y that area will be shared by (a club of) domains which share floating state. It will be null just in case the real hardware state is already duplicated in RAM. Such state is left in the real hardware registers to optimize the case where only one domain (or club) is frequently accessing such state. In such cases floating state need not be saved or restored upon context switches. If LF indicates that the real registers are occupied, then LF says where that state must first be saved.
Prepared domains are in one of two states:
Another possibility is that the domain’s floating state currently resides in some real processor. In fact it may reside in the real processor that took the fault but kernel design X would avoid this. In design X we turn on the UAB just as we load the real floating state. In design Y we turn on a domain’s UAB only after the domain has been trapped for its being 0.
A surprising thing about this scheme is that domains can share floating state if that state lives in its own (sharable) domain annex. I don’t know if it is easier to allow (design Y) or prevent (design X) this. I cannot think of a compelling reason to allow it except as it may be simpler the kernel. In this scheme it would seem unnecessary to consider the floating state to belong to the domain any more than the address segment belongs to the domain. (By belong I refer to the habit of saying that a domain (in 370 Keykos) consists of three nodes.)
Page 263 of volume 1 states that the MMX state is aliased to the floating point state. Bit TS in CR0 disables MMX instructions. Vol. 3a, section 2.5 (page 67) describes the TS bit of CR0 as controlling access to SSE* state as well as floating point state. (Section 12.8.1 of Feb 2014 edition) The text seems to use “task switch” as a technical term. See v3a, §6.3. §6.4.2 talks about task switches and introduces “task gate descriptor”. It recommends Chapter 5, Vol. 3b. Vol. 1, section 11.6.10.2 speaks of XMM state and task switches. Page 414, vol 1. refers to Chapter 6, Vol 3a. for more info on task switching. After an hour or so with Adobe Reader I sort of think that the TS bit in CR0 does its thing for the unified SSE* and floating states. I think that EROS and Capros do not use the task structures defined by the x86 architecture. I presume that bit TS in CR0 can be explicitly set.