I propose here a business plan for storing other people’s data. An enterprise called a data bank would accept, store and return data under a variety of contracts covering several contingencies.

The purpose of a data bank or archive is to store large bodies of information for long periods of time. I suggest here some protocols and contracts for a data bank and its customers. We then discuss risks, incentives and stratification of the data storage industry. Eric Hughes and jpp@markv.com have pointed out conceptual blunders and vulnerabilities in earlier versions. Perhaps they have been corrected here.

The data bank described here is ignorant of the nature of the data it keeps. A data collection is identified by its secure hash. The bank is motivated mainly by the penalties for loss called for in the contracts it has signed which are spelled out below. The bank need not even remember who owns the data.

Here are several transactions that a data bank engages in.

Acquire data:
A client anonymously sends a collection of data along with funds sufficient to warrant the bank’s computing its secure hash and holding the data for a few days. The bank knows the data only by its secure hash.
Publish index:
The bank can publish its list of hashes. (This enables data hunters.)
Sell Data:
Any one can request a piece of data identified only by its secure hash. The bank is free to sell a copy of the data to anyone with the secure hash. The bank negotiates the price.
Selling a (Hat) Check:
The bank will sell a check to anyone who will pay a negotiated price. The check specifies the secure hash of the data, the cost of redeeming the data, and the penalty (liquidated damages) to be paid by the bank upon failure to produce the data. A client proposes the details of a check as follows: Send (SH(acquisition), redemption price, penalty, SH(Secret)) to the bank along with a proposed price. ‘Secret’ is a secret random number chosen by the client for this negotiation. If the bank agrees, it signs and trades the signed message for the proposed price, or it may propose another price. The signed message is the check and is a bearer instrument. The bank can sell multiple checks for the same data. Different checks for the same data may specify different penalties.
Cancel a check:
A holder of a check may sell it back to the bank at a negotiated price thus releasing the bank from the risk of paying a penalty in the future. The check is canceled by the mere fact that the bank learns and will remember the Secret that produced the SH(Secret) in the check. The bank need not otherwise acquire or maintain a signed check revocation. This also allows the bank to retrieve the physical storage where the data is stored if it is sure that it has not sold other checks for the data.
Access Data:
Any holder of a check can present the check, the redemption fee and demand the data. The data bank must then either produce the data or pay the penalty to the holder of the check. The bank may respond by stating the Secret of the check thereby proving that the check has been canceled. Alternatively the bank may reply that it cannot produce the data and is thereby subject to the penalty.
Pay Penalty (liquidate damages):
The bank trades the amount of the penalty for the Secret of SH(Secret). A particular check is canceled whenever the bank pays the penalty like a spent Chaum DigiCash note. Alternatively the bank refuses, offering to provide the data instead. Yet alternatively the bank proves that the check is already canceled by stating its Secret. A zero knowledge proof might be used here to prove to the bank that the client in fact holds the Secret. Penalty negotiations can thus proceed without revealing the Secret to the bank. Since the penalty amount is probably the largest sum involved in these transactions, this transaction is most likely to require escrow service. It is also the most difficult transaction to carry out anonymously.
Checks may specify expiration dates, cancellation terms etc. The bank is explicitly permitted to disseminate the data and may well do so to lay-off and reduce risks. In this sense a data bank is like an insurance company that spreads and shares risks. A check may be viewed as a life insurance policy for the data. Penalties for delayed data delivery might also be specified. This would make it easier for the bank to sub-contract data storage.

Risks

Trust may be divided by agreeing on an escrow agent. Upon redemption the bank examines the check to see if it has been canceled. If the bank knows the Secret which produced the SH(Secret) of the check the check is canceled. If the client is playing by the rules, the client and bank can proceed thru the redemption transaction without escrow and with only the redemption amount at risk of bank fraud. Alternatively a mutually trusted escrow agent takes the check, accepts the redemption payment specified therein from the client, passes over the data on its way from the bank to the client while computing the secure hash. If the secure hash matches that in the check the escrow agent delivers the payment to the bank. If the hash fails to match, the transaction is aborted and a penalty transaction begins. The bank delivers the penalty to the escrow agent and the client delivers the Secret to the escrow agent. If the hash of the Secret matches that in the check then the escrow agent delivers the Secret to the bank (canceling the check) and the penalty to the client. The escrow agent need not have long term financial stability as must the bank.

Inflation can damage incentives. Checks might be denominated in gold or currency baskets or what ever.

RSA modulus size is critical for long term contacts. 2K bits of modulus or more may be warranted.

Example

I can imagine the Getty Museum digitizing its Rembrandts and storing the results in a data bank. The data might be insured for $10,000,000. The bank would disseminate the data to increase security and lower its risk. The museum would probably encrypt the data and share the encryption key and hash ala Shamir for safe keeping. The museum would not share the Secret of the check because it wants to be the one paid upon default and, it wants to sure that no one else cancels the check. It might disseminate the check but not the Secret of the check to others so that they are assured of getting the redemption price for accessing the data.

Incentives

A data bank, or any other player, may find it profitable to keep the data beyond the point of any uncanceled checks. It can make money by selling copies of the data. Data banks thus have an incentive to disseminate their list of holdings in the form of hashes, to support data hunters.

Eric Hughes notes an incentive for a bank to take the money and run at some point. If faced with a $10,000,000 penalty, a data banker may be unable or unwilling to pay. If escrow is used then the client who holds a check for the lost data can only damage the reputation of the bank. The reputation may be worth less than the $10,000,000. If the data bank is one and the same as some institution already required to have long term financial stability, this is correspondingly less of a problem. Sellers of life insurance policies and earthquake insurance are in this category. See “Deep pockets”, below, as a suggestion for a solution to this problem.

Design Considerations

It may seem strange that the data bank is willing to sell data to who ever will pay. I suggest this because it is so easy to encrypt the data and not have to trust the bank. You can distribute the key thru what ever channels you transmit the secure hash of the data.

Note that bank clients are always anonymous. Data is never held for some known person. Data may be held solely for speculation. The purpose of the penalty is to motivate the bank to keep data when there is no reason for the bank to forecast sales revenue. Unlike Chaum bank notes, the issuance of a hat check may be associated with the redemption. The depositing of data and hat check issuance, however, may be anonymous. Data redemption may be anonymous but collecting a substantial penalty may be difficult to arrange anonymously. Managing anonymous transactions is a difficult but orthogonal issue.

One way to manage anonymous data acquisition is to emulate TCP’s ability to assemble packets that are reordered, redundant and missing, into a complete whole. Forward error control (ECC & such) applied across packets can alleviate missing packets.

The Bank’s State

Logically the bank can perform all of these transactions by merely keeping the unordered set of acquisitions. It is practically necessary to index these by their secure hash but this can be rebuilt from the acquisitions themselves. When it loses data it must keep canceled checks to avoid extra penalties. The bank need not keep records of checks that it has sold unless it wants to know when it can delete acquisitions. The bank will keep a list of Secrets indexed by SH(Secret) in order to detect canceled checks. It may want to keep marketing information to know when acquisitions are worth keeping merely to sell copies of. The bank will need to keep records of the checks that it issues for financial auditors (to satisfy owners of the bank.)

Bank Strategies

Banks might subcontract with other banks to: As in insurance, banks can reduce penalty risks by subcontracting with other banks. There is then a risk that in some cycle of banks, each depends on the next to have the data. A bank has an incentive to occasionally demand portions of the data that it has contracted for. Any such cycles are thus detected early. A bank should recompute the hash occasionally for any data for which is is liable for loss. If the data is duplicated then the bank can buy repairs from another bank.

Stratification

Perhaps the long term data storage industry can be divided into the following pieces:
  • Deep pockets: someone who’s reputation is likely to be worth more than any penalty. Deep pockets must know who knows how to store data well, and must be able to convince clients that he will be around for a long while. Deep pockets makes his money from selling hat checks.
  • Data storage specialists: These guys may come and go as is typical in new technologies. They have a continuous income stream from Deep pockets. They occasionally deliver stable data storage media with bits on it to Deep pockets for their keep.
  • Escrow agents: These know the protocols and maintain short term reputations.
  • Data hunters who merely index secure hashes and who has the corresponding data.
  • Data Hunters engage in knowing who has what data. Given a hash they can tell you what banks have the data. This might be the ultimate URL or URI server.

    Ted Anderson has made some proposals along the same lines. It would be good to compare them in detail. There are references there to notes with ideas similar to these. I don’t claim priority here.

    Tahoe is a promising technology upon which a data bank might well be based. The ideas presented on this page merely presume infrastructures such as Tahoe.