This is about caches in a larger and longer term prospect. The anchoring concept here is to run machine language programs designed to concurrently share read-write memory, while supporting well the more common programs that do not share memory. Also tricky is detecting the difference. We try here to find the logical properties necessary for a system of caches to serve such a set of memory clients. I try here to record the logical properties of a cache so that a network of such caches can serve as a virtual memory system for some collection of memory clients. I assume the logical notion of memory which means roughly that when you submit an address in a read request to memory, it returns the value most recently stored there with a write request.

The phrase ‘most recently’ above presumes a simple ordering in time of memory requests. We retain the right to define this ordering to be any ordering ‘compatible’ with the ‘real order’. I will not be more precise here except to note that there may be signal paths outside the memory system between memory system clients that must limit our choice of ordering. Even pickier thoughts.

I posit here a sort of node (hopefully capturing the notion of cache). The node holds a discrete set of data elements which we may as well call lines that are identified by addresses. A node has a particular limits, which we view here as inscrutable, on what subsets of lines it can hold. We require, however, that any cache can hold any particular line. It may be necessary, in some applications, to require that any cache be able to hold any particular pair of lines. (The 370 TLB was logically required to hold an arbitrary subset of 8 entries due the nature of the instruction set which might create 8 requests in one unit of operation.) The node ‘links to’ several other neighboring nodes but has one special neighbor called the base. ‘Links to’ is a symmetric irreflexive relation which we probably must require to define a spanning tree covering all the nodes. The base is special as it will always agree to a message of the form ‘take this here line because I have no room for it.’.

There may be an ultimate node (the bottom of the cache hierarchy) which has no base neighbor. We must decide whether to count memory clients as nodes. If so then those nodes have only their base as a neighbor. If not then some nodes have neighbors that are not nodes.

Pragmatically some nodes are smaller and faster than others and the base neighbor of a node is slower than the node.

Remote Keykos Segments is a kludge that fits these patterns. Key to that solution is the ability to detect transmission of ‘addresses’ across ‘membrane’ boundaries. The hack would not work if real 370 addresses played the rôle of ‘address’ in the principle cache discussion above. Nor can what Keykos calls the virtual address play that rôle. The closest thing to a named concept for that rôle is ‘offset within segment’ along with segment identifying information. The bits within a capability on one machine are related to the bits within the ‘same capability’ on another machine only by programs with the responsibility of ‘exporting’ such capabilities. These programs are in the TCB of only those applications who explicitly rely on them. Much discussion on these issues is found in this large directory. Two good starting points, perhaps: Capabilities and Distributed Systems, Concurrent Distribution.
I think that this is the paper where Belady shows how to effectively compare a cache replacement algorithm with the perfect foresight algorithm. The latter is cannot be implemented, but the comparison may tell you that you could not have done much better in some particular execution run—it tells you when you can no longer improve your algorithm.
Here is an idea that I think is not new or, at least, very little of it is new. I probably heard some scheme much like from Dick Arnold at IBM in 1968. This paper suggests that this idea appears in “A. Agarwal, J. Hennessy, and M. Horowitz. cache performance of operating systems and multiprogramming. In ACM Transactions on Computer Systems, 6, pages 393−431, November 1988.”.

You have primary and secondary hash functions. For a cache that can respond to two requests per cycle process two requests when they occur in the same at their respective primary hash locations. If there is only one request during a cycle use both hash addresses for that request. Always look at the secondary hash location before going to backing store or snooping. When a secondary hash succeeds move line to primary cache set if there is room there. Before expelling a line, see if there is room in the set at its secondary hash. It is cheap to know whether a cache set is full. It is fairly easy to know whether a cache set is under pressure even when full.