One change is that while we probably trust a local disk to make only innocent errors, we will want not to so trust the link and store. We might trust the store and not the link, by encryption and authentication on the link, but we explore here not trusting even the store for integrity. This leads me immediately to the hash tree notion and I have not considered others. This is a scheme where data is identified to the store by its hash. It seems intriguing but I don’t pursue it for reasons listed there.
When a page is summoned from or sent to the store, I assume that it is identified by a dense binary code called the address. The address is intended to be like a physical disk address and several incentives to achieve reference locality in this space will follow, in part out of habit. The address of data directly locates the data in the hash tree. Adjacent 4 bit nibbles from this address index into the hash groups.
I implicitly assume that the client holds any hash group that includes hashes of lower groups or pages. There is opportunity here for the hashes under discussion to ensure transmission integrity. I ignore that opportunity just now.
An early design decision is whether the store is cognizant of the hash tree. We briefly examine the two alternatives and the immediate ramifications:
Store requests to the store always include the entire hash path for the modified data. This maintains an invariant that the hash groups at the store are always up to date.
Store requests merely present the data and its address.
This provides some implementation flexibility. The stored hash groups have no external address in this scheme.
The radio link can lead to an agent in the network that forwards store requests to two different stores and read requests to one or the other. This employs redundant stores without extra radio traffic. The client may want to verify acknowledgments of store requests independently in order to make both accountable. This scheme greatly decreases vulnerability to the store with no extra radio traffic.
These ideas so far seem to preclude sharing at the store. Books and big software seem like logical candidates for sharing. It seems unnecessary to encrypt books out of copyright or open source software. Other material must probably be encrypted on the link. Dissemination of a crypto key might control access to encrypted information. Alternatively the store might embrace capability discipline and deliver protected data as if it were mapped into the address space in the above sense of “address”.
If checkpoints are identified by hash then there needs to be a safe place to store those hashes — safe from dropping the client in the lake. A simple circular page buffer at the store, per client, which keeps the most recent versions of that page and can be retrieved by password seems simple. This is a funny way to bottom out! Another way is for the client to send e-mail to the client’s owner with the root hash for checkpoints.
I want to separate encryption from integrity for I have been stung by making the natural connection. I will thus speak as if the store holds plain text. We must then require of a crypto plan that small changes lead to small changes. We may also require a metric over the state space of a file system; perhaps the Hamming metric. Moving the contents of a 10MB file over by 8 bits thus counts as a big change. This, in turn, burdens applications to code text files using updates. Question: Does Keykos provide a good place to install code to do this stuff without burdening the app? This is closely related to the question of where to put file compression code in Keykos. Curiously, the .pdf file format already supports updates at the end.
A store state is a dense array of page states. A page state is 4KB in order to match common memory maps. The client holds the root hash of a (Merkle) hash tree of one or two store states. The tree element is the concatenation of 16 SHA-1 hashes, each of a lower level element or of a page. The store keeps the pages of a store state and can reproduce the tree elements but may well store them too. That is up to the store. Some states are coherent but the store makes no distinction. One message to the store is to return a page and k of the elements above that page. The arguments is page index, version number and k. A client that knows some of the elements can call for just those it doesn’t have. I assume it has the elements above the elements and pages that it has. It is not clear how important it is to be able to call for less than a page. The client can verify the page content starting from the root hash.
The reason for multiple states at the store is to support checkpoints. The Hamming distance between the two states is typically small and it is necessary to rely on that in the engineering and pricing of the store.
Design question: May a particular page be written to the store more than once per version? If not then a checkpoint by the client could proceed as follows: