Logic of the Virtual Machine

As most machine architectures evolved the architected state of the machine became divided into two categories. This separation appeared first in the hardware and only later appeared in the organization of the definition of the architecture. We use notions and terminology introduced here except that we use “machine mode” in place of “machine state”. We gather here some requirements on the architecture of a machine that enable the sort of virtualization that we describe here. Indeed this is the only sort of virtualization that I am aware of. The portion of the architected state that is accessible to problem mode instructions is called the problem state; the rest of the CPU state is called privileged state.

The problem state architecture is very much like a simple computer architecture with no IO facilities but with the complete facilities needed to compute efficiently. This architecture has no facilities to allow one program to dominate another, except by emulation.

By virtual machine we refer here to an illusion of a real computer, called the guest, that is provided by a program that we call here the kernel. The architecture of the problem mode instructions of the real computer is the same as that of the guest and the CPU of the real computer directly executes the instructions of the guest at native speed. During such bursts of problem mode guest instructions the real machine is in problem mode regardless of the mode of the guest. The real machine must trap to the kernel upon encountering a privileged instruction while in problem mode. When a privileged mode instruction is encountered in the guest program the hardware refuses to execute it and the kernel gains control to interpret that instruction relative to the privileged state kept by the kernel for that guest. The privileged architecture of the guest is defined entirely by the logic of the kernel. The kernel can present any privileged architecture that it wants.

The kernel provides several guests at once, running just one at a given moment—as in time-sharing. The kernel allocates varying amounts of real memory to a guest and employs the real memory map to provide the illusion of the memory of the guest. Just as kernels of conventional systems allocate real memory to only recently accessed memory of the application, so does our kernel provide the illusion of the guest’s memory. The guest has thus access to all possible addresses at least on real hardware that does not use RAP. The real memory map must not use RAP, or the performance cost must be acknowledged and the feasibility of the kludge described in the RAP page be proven. By swapping to disk the kernel can provide an illusion of more memory than the real machine has, just as do conventional OS kernels.

The kernel must maintain a map from features of the guest to the real machine. This map may vary with time as when a memory page of a guest is sometime in real memory, and sometime swapped out to disk. Sometimes a page of the real machine may be in the image of more than one guest as in the case when two guests are running same guest kernel. Likewise immutable portions of a disk may be shared. Hardware to access a network may be multiplexed by the kernel so as to provide the illusion to each guest of its own network access. Portions of real disks can be allocated to guests to provide disks for guests.

The kernel maintains the virtual state of the CPU for each guest. To allocate the real CPU to a guest the kernel copies the problem mode state of the guest to the real CPU but leaves the privileged state of the real CPU in a mode suitable to regaining control from the guest after some time limit or interrupt or trap. The kernel can then put the real CPU in real problem mode so as to execute the instructions found in the virtual memory of the guest at full speed.

Since the program in the guest runs in problem mode it has no direct access to the real privileged state. If that program was written to be run in privileged mode it may encounter privileged instructions which will cause traps to the kernel. The kernel can interpret such instructions according to the privileged state of the guest. This virtual state includes the bit which determines the mode of the guest and this bit determines the interpretation of the privileged instruction, of course. If the guest is in privileged state and the privileged instruction has to do with IO then the kernel consults its map for the guest to see if there is real hardware that should be operated to serve the requirements of the privileged instruction in the guest.

The x86 instruction set has an instruction LDT that is not privileged despite accessing part of the privileged state of the CPU. A few other instructions like this make a kernel for the x86 complex and slow. Techniques that are slow, expensive in space, and go far beyond the scope of this note are necessary to virtualize such machines. There were other problems with virtualizing the early x86 architecture. Later versions of x86 architecture have ameliorated these problems but added much complexity. Somehow the IBM 370 dodged these problems with the semi exception of the real time clock. When Wang built a clone of the 370 they added features that did not conform to requirements separating privileged and problem states. It could not be virtualized.

When CP 67 was the state of the art it was common knowledge that it was impossible to provide guests with memory maps. When IBM announced VM 370 they noted that it would provide complete virtual machines including the map. Knowing that it was possible indeed led to the insight of how to do it. That insight needs to be recorded here.

When the guest executes the privileged instructions to define a memory map the kernel becomes aware of the need to provide the illusion of the guest’s memory map which means consulting the memory map built by the guest. The kernel is in a position to do this of course. When the guest it to run in a mode that requires that to be effective, the kernel builds a ‘shadow map’ for the guest that is distinct from the memory map normally used for that guest. The shadow map is the composition of the map build by the guest program, and the map built be the kernel to serve the guest. This shadow table plays the role of the guest’s TLB and rules for programs that build memory maps are naturally such that the shadow table is to be erased when the guest issues a purge TLB. The exact nature of the 370 TLB varies widely between models but precise rules are given sufficient to write programs to run on all models. The shadow tables provided by IBM’s kernel was larger than any real TLB and this led to exposing bugs that violated these rules, and led to infrequent mysteries. When such bugs struck much more frequently running as guest they were easier to find and fix.

The IO facilities for the guest in VM/370 are noteworthy. They are described in fair detail here. There was almost no privileged code that was specialized to a particular sort of device. (No drivers!) Guests were equipped with card readers and punches, all virtual of course. They were were also equipped with a queue of decks, which real machines lacked, unless simulated by a human operator. There was a magic ‘privileged’ instruction that would cause the next ‘deck’ that was punched, to be put into some other guest’s input deck queue. These facilities, being all virtual, did not suffer from the performance problems of real cards. The card reader of the real machine could accept input decks to be routed to some queue as specified by the first card. Ditto for card output.

Further notes

IBM compares and contrasts the larger gamut of ‘virtual machines’.