Meltdown

These are my notes on this.

If you are up on your x86 microarchitecture (I wasn’t) you might want to start with heading “Step 1:” on page 8. Many pages teach us about speculative execution. They suggest that speculation is triggered by a conditional branch whose condition is not yet available. Here for the first time we hear that whether the CPU is in privileged mode is considered a fact that it will take some time to resolve and so proceeds, speculatively, as if loading from a kernel-only-virtual-address were OK. Some time later those speculated instructions will be prevented from retiring when it is discovered that the CPU is in user mode.

“During the retirement, any interrupts and exception that occurred during the execution of the instruction are handled.” I find this too vague. (My notes) What is the nature of the mechanism that ‘knows there is an interrupt to handle’? Many other instructions produce such contingencies. Does every load or store operation cause the speculator to stash a register map just in case it is necessary to return to that state? The x86 eventually notices when a user mode load operation fetches a word from kernel memory even for speculatively executed instructions. If the speculator is in user mode then there is no more reason to go on speculating past the load than when the speculator passes an unconditional jump. Speculation can manifestly be stopped as when the accessed page is not in the TLB nor even in the memory map of which the TLB is a cache. I can imagine no microarchitecture where privileged access is not known before the real speculated load. Such a load can only pollute the cache.

Is there a better time to do this than when the memory map entry from the TLB is found to match the address of the booty? (I presume the TLB entry includes the protection level of the target page; how else could protection violation be detected?) When the load command fetches the word in kernel memory it has already noticed that this is a kernel only word.

It seems the CPU mode is not part of the speculated state as are register contents. I see a strange paragraph on page 158 of Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1:

Current privilege level (CPL) — The CPL is the privilege level of the currently executing program or task. It is stored in bits 0 and 1 of the CS and SS segment registers. Normally, the CPL is equal to the privilege level of the code segment from which instructions are being fetched. The processor changes the CPL when program control is transferred to a code segment with a different privilege level. The CPL is treated slightly differently when accessing conforming code segments. Conforming code segments can be accessed from any privilege level that is equal to or numerically greater (less privileged) than the DPL of the conforming code segment. Also, the CPL is not changed when the processor accesses a conforming code segment that has a different privilege level than the CPL.
Descriptor privilege level (DPL) — The DPL is the privilege level of a segment or gate. It is stored in the DPL field of the segment or gate descriptor for the segment or gate. When the currently executing code segment attempts to access a segment or gate, the DPL of the segment or gate is compared to the CPL and RPL of the segment or gate selector (as described later in this section). The DPL is interpreted differently, depending on the type of segment or gate being accessed:
- Data segment — The DPL indicates the numerically highest privilege level that a program or task can have to be allowed to access the segment. For example, if the DPL of a data segment is 1, only programs running at a CPL of 0 or 1 can access the segment.
- Nonconforming code segment (without using a call gate) — The DPL indicates the privilege level that a program or task must be at to access the segment. For example, if the DPL of a nonconforming code segment is 0, only programs running at a CPL of 0 can access the segment.
- Call gate — The DPL indicates the numerically highest privilege level that the currently executing program or task can be at and still be able to access the call gate. (This is the same access rule as for a data segment.)
- Conforming code segment and nonconforming code segment accessed through a call gate — The DPL indicates the numerically lowest privilege level that a program or task can have to be allowed to access the segment. For example, if the DPL of a conforming code segment is 2, programs running at a CPL of 0 or 1 cannot access the segment.
- TSS — The DPL indicates the numerically highest privilege level that the currently executing program or task can be at and still be able to access the TSS. (This is the same access rule as for a data segment.)
Requested privilege level (RPL) — The RPL is an override privilege level that is assigned to segment selectors. It is stored in bits 0 and 1 of the segment selector. The processor checks the RPL along with the CPL to determine if access to a segment is allowed. Even if the program or task requesting access to a segment has sufficient privilege to access the segment, access is denied if the RPL is not of sufficient privilege level. That is, if the RPL of a segment selector is numerically greater than the CPL, the RPL overrides the CPL, and vice versa. The RPL can be used to insure that privileged code does not access a segment on behalf of an application program unless the program itself has access privileges for that segment. See Section 5.10.4, “Checking Caller Access Privileges (ARPL Instruction),” for a detailed description of the purpose and typical use of the RPL.

The whole section 5.5 seems alien to me. I recall studying Intel’s gate concept many decades ago and deciding I could not use it. I wonder if anyone uses it.

I know no use of system using privileged mode that allows some code to be executed in either state. Certainly not Keykos nor the VM’s I have studied.

The Wikipedia article suggests that there are no chips in the field with the TSX feature. Intel manuals suggest the feature has been available again since about 2015. The paper suggests that their particular exploit uses TSX.