I should use SDRAM everywhere below in place of DRAM!
I record here some notes on plausible modifications to the interface between DRAM and its caches. I import from here terminology “cycle” and “operation” as applied to DRAM. You don’t need to read that if you understand the following sentence: A write command (to DRAM) requires a read cycle followed by a write cycle and a read command also requires those two cycles. Actually we propose here to modify those rules and omit some of those cycles.
This idea is predicated on the kernel keeping track of DRAM pages that are accessed with write authority. The kernel must account for which caches might hold data from which DRAM pages. The IBM 370 insulated the kernel from all such considerations. More recent systems do not.
Consider a DRAM page holding a fragment of code that has not been compiled recently. There is no good reason to build a page table entry that grants write access to that age. This observation does not relate to correctness but to performance. When a cache miss causes a read to this page the mapping hardware becomes aware that no write access is intended and this fact is recorded in the cache line. Such a cache line is manifestly clean. In this case memory system instructs the DRAM to perform the normal regeneration (write) cycle following the read cycle. If a clean cache line is evicted from cache the rewrite command to DRAM is skipped.
If the memory system fills a cache line having located the DRAM page via a page table entry that grants write access, there is almost no scenario where there is benefit to a regeneration which we thereby propose to omit. The single valid copy of the data is in the cache. If another cache wants that data then consulting the cache will report: “data not here”, at least if we are doing ECC. This is soon enough to invoke cache snooping which while faster, uses more critical system resources.
There is a design option here. We might hope that the line will remain unmodified and so we mark it clean and command DRAM to regenerate its data. If upon eviction the line is still clean then no DRAM update is necessary. In any case the eviction of a dirty cache line causes a write command, but (!) if the the cache line were marked dirty (upon load) and concomitantly the DRAM was left un regenerated, then the normal read cycle can be omitted and proceed direct to the write cycle. The design decision is whether to have a cache line state that means “DRAM” is empty. Without that state we must mark a loaded cache line loaded via a RW page table entry as dirty.
If DMA for disk is ignorant of the cache then the kernel must cause any dirty lines in the cache to be written back. When a new logical page is created and a consequent DRAM page frame is required, then it is necessary to clear that frame. A sequence of read cycles by either the CPU or DMA suffices for this.
The overriding invariant is that valid data is found looking first in the cache and if not found there consulting the memory map (including page table entries) to find the DRAM, and failing that, trap.
This whole plan does little to reduce critical paths. It does unload what may be a critical system resource: access to DRAM. Dynamic decisions, like whether to mark as dirty cache lines from mutable pages, might be made depending on the nature of a queue for access to that DRAM.
I think that DDR3 SDRAMS do not have commands to support this idea. DDR4 is no better.
Shared mutable RAM is a hard abstraction to use. Shared mutable RAM is an expensive illusion to provide in case of sharing between different caches. I don’t know a general alternative.