Suppose that I want to remember a secret, such as a 100 bit number, for several years and that I do not want to write it down inside a computer or out. For instance it might be the seed of my private key. I will suppose here that I can keep a reliable record somewhere but am unable to keep that record secret. Web sites that require a personal password frequently record other obscure information, such as mother’s maiden name, as an alternate identification means. I propose here a similar but more secure scheme based on that insight.
When we are young we learn quite a few things that we probably will not forget. Many of these are known only to ourselves. I think that there is no one alive, nor any record of the name of my first dog or my first cat, yet I remember those names well. Such memories may provide sufficient entropy for a long lived secret.
There are two parts to the scheme: The early part is when I form the record, and the late part is when I use the record and my memories to retrieve the secret.
In the early part I collect such memories arranged as an unordered set of keyword-value pairs. I select a threshold, t, which is a number between 0 and 1 and that is the fraction of the memories that must be combined with the record to recover the secret. I thus select at the beginning a tolerable amount of memory loss. I record the keywords in the record and, of course, omit there any clue as to the values. The Shamir secret sharing algorithm (SSS) can be used during the second part to recover the entire entire secret from the record and a sufficient portion of the memories.
The Shamir algorithm has two phases: The early phase produces several shares from the secret and the late phase combines a subset of the shares to recover the secret. Each share is the same size as the secret.
Here is how we adapt SSS:
The early algorithm takes the keyword-value pairs, enumerates them, and includes an enumerated list of keywords in the record. It next invokes the early portion of SSS providing:
Our routine now combines these portions, perhaps with exclusive or, with the enumerated secret values broken into fixed size hunks. Each hunk is hashed to a bit string the length of the secret. These combined values are then included in the record.
It is important that each of these hunks carry about the same entropy if the quorum logic is to be effective. For instance if 10 right answers constitutes a quorum and there are 10 questions each providing about 1 bit of information, then the attacker has a trivial job. An estimate must be made of the entropy in each answer, either by the software or by the user. Questions with high entropy answers might be divided into more than one hunk.
For recovery of the secret, the record is used to prompt for most of the memories, producing the same values which can once more be divided into the same hunks, recombined with the values in the record to retrieve the secret portions from the early phase of SSS. These can now be delivered to the late phase of SSS which will then return the original secret.
There is a delicate pitfall to be avoided here that requires some careful analysis that I will not do just now. You probably want at least 90 bits of entropy to guard against offline attacks. Some of the questions, which are available to the attacker, will suggest only a few bits of entropy in the answer, often just one. If there are too many such questions and each forms a hunk, the attacker may form a quorum by guessing answers to such simple answers. One protection against such an attack is to combine simple questions into a single hunk, and do this in such a way as to make each hunk contain nearly the same amount of entropy. This may require the late phase to combinatorially guess at answers that are forgotten. The attack program has thus much in common with the legitimate late phase program.
The question is of authorship, as contrasted to plagiarism. A person who wishes to claim authorship submits to the following test: The text in question is automatically modified to replace every 5th word by a blank. It has been observed that the real author is able to replace most of those words correctly while another person, even if he is familiar with the same subject, cannot.
To provide the same function as above some 1000 or so words of text composed by the principal are chosen. Every 5th word is hashed to about 2 bits which yields about 400 bits. These bits are submitted to a error correcting scheme that can recover after 100 of the 400 bits have been corrupted. The bits are used to produce the private key.
See this for a crude description of an attack on this scheme.