Distrusting the Disk

Distrusting the Disk
or Radio Link Displaces Disk Channel

Imagine a Keykos (or Eros) system in your pocket. It has a digital radio link as it also doubles as a cell phone. It might include a small disk but lets explore using the radio link to a remote store instead where data is stored reliably for a long time, presumably on disks. We will call it “the store” for now, for lack of a better name. The client is the thing in your pocket.

One change is that while we probably trust a local disk to make only innocent errors, we will want not to so trust the link and store. We might trust the store and not the link, by encryption and authentication on the link, but we explore here not trusting even the store for integrity. This leads me immediately to the hash tree notion and I have not considered others. This is a scheme where data is identified to the store by its hash. It seems intriguing but I don’t pursue it for reasons listed there.

When a page is summoned from or sent to the store, I assume that it is identified by a dense binary code called the address. The address is intended to be like a physical disk address and several incentives to achieve reference locality in this space will follow, in part out of habit. The address of data directly locates the data in the hash tree. Adjacent 4 bit nibbles from this address index into the hash groups.

I implicitly assume that the client holds any hash group that includes hashes of lower groups or pages. There is opportunity here for the hashes under discussion to ensure transmission integrity. I ignore that opportunity just now.

An early design decision is whether the store is cognizant of the hash tree. We briefly examine the two alternatives and the immediate ramifications:

Store is not Hash Savvy

In this scheme the hash groups are generated by the client and kept in the store and the store is unaware of their purpose. Awkwardly, the hash groups are not page size but this can be easily kludged around. Any read request to the store includes requests for the hash groups not already held by the client. The client can easily compute the address of these groups. As data and groups are delivered to the client, their hashes are computed and compared with the stored values.

Store requests to the store always include the entire hash path for the modified data. This maintains an invariant that the hash groups at the store are always up to date.

Store is Hash Savvy

In this case read requests for a page indicates how much of the hash path is to be returned along with the page. The store has either stored or can compute these hash groups.

Store requests merely present the data and its address.

This provides some implementation flexibility. The stored hash groups have no external address in this scheme.

Comparison

I like the savvy scheme now. We may accumulate further ramifications here.

The radio link can lead to an agent in the network that forwards store requests to two different stores and read requests to one or the other. This employs redundant stores without extra radio traffic. The client may want to verify acknowledgments of store requests independently in order to make both accountable. This scheme greatly decreases vulnerability to the store with no extra radio traffic.

Regarding Checkpoints

A signal to the store might identify a root hash as one to hold onto in the sense of keeping the entire tree down to the page states until further notice. This connects with the big store ideas. It is an additional wrinkle to those ideas in that substantial sharing between assets occurs and pricing may become more complex. This idea may supplant the idea of versions mentioned below.

These ideas so far seem to preclude sharing at the store. Books and big software seem like logical candidates for sharing. It seems unnecessary to encrypt books out of copyright or open source software. Other material must probably be encrypted on the link. Dissemination of a crypto key might control access to encrypted information. Alternatively the store might embrace capability discipline and deliver protected data as if it were mapped into the address space in the above sense of “address”.

If checkpoints are identified by hash then there needs to be a safe place to store those hashes — safe from dropping the client in the lake. A simple circular page buffer at the store, per client, which keeps the most recent versions of that page and can be retrieved by password seems simple. This is a funny way to bottom out! Another way is for the client to send e-mail to the client’s owner with the root hash for checkpoints.

Store Logic

When the store is notified to hold on to some particular root hash it marks the top level group as hold. If subsequent messages arrive to modify some page state then the top group is saved and the modification is made recursively to the next lower group. Log(n) groups are displaced (but saved) including the page and the new page state supplants it. Further messages to modify page states will displace fewer groups, depending on locality of reference. This saved data suffices to recover page states corresponding to older checkpoints. The invariant is that each page state that the store is committed to keep is under (just one) hash group that is marked hold, or has already been saved. Indexing such saved data is an interesting exercise but I won’t pursue it here except to note that some batch processing may be suitable for access to such stale data is likely to come in bursts.

I wrote the stuff below before the stuff above. There remain interesting ideas and questions below.

I want to separate encryption from integrity for I have been stung by making the natural connection. I will thus speak as if the store holds plain text. We must then require of a crypto plan that small changes lead to small changes. We may also require a metric over the state space of a file system; perhaps the Hamming metric. Moving the contents of a 10MB file over by 8 bits thus counts as a big change. This, in turn, burdens applications to code text files using updates. Question: Does Keykos provide a good place to install code to do this stuff without burdening the app? This is closely related to the question of where to put file compression code in Keykos. Curiously, the .pdf file format already supports updates at the end.

A store state is a dense array of page states. A page state is 4KB in order to match common memory maps. The client holds the root hash of a (Merkle) hash tree of one or two store states. The tree element is the concatenation of 16 SHA-1 hashes, each of a lower level element or of a page. The store keeps the pages of a store state and can reproduce the tree elements but may well store them too. That is up to the store. Some states are coherent but the store makes no distinction. One message to the store is to return a page and k of the elements above that page. The arguments is page index, version number and k. A client that knows some of the elements can call for just those it doesn’t have. I assume it has the elements above the elements and pages that it has. It is not clear how important it is to be able to call for less than a page. The client can verify the page content starting from the root hash.

The reason for multiple states at the store is to support checkpoints. The Hamming distance between the two states is typically small and it is necessary to rely on that in the engineering and pricing of the store.

Design question: May a particular page be written to the store more than once per version? If not then a checkpoint by the client could proceed as follows:

Clean all the pages by writes to the store all under the same version. (pages in this discussion correspond to both Keykos pages and node pots and maybe swap pots.)
Ask the store to return the root hash.
If it agrees with its own theory declare a checkpoint taken. Otherwise declare a mishap and go into some sort of panic mode. Here are two possible modes:
- Do an interactive hash tree walk following any downward hash path directed by conflicting hash values. This can only end in a page which differs in the client and store, or a hash in the client for an element not in the client.
- ?

Keykos, to date, also provides swapping between checkpoints. In particular more pages may be dirtied between checkpoints than there is RAM. Perhaps more frequent checkpoints would suffice but I am not convinced.