This is a “thinking out loud” note on how to define and implement a Light Cone Checkpoint Mechanism in a system such as Keykos.

Assume 32 EOs, (Event Objects) which are kernel objects and to which there are rescindable capabilities. An EO can be exclusively acquired by an application and used to gain the advantages of low cost checkpoints for that application despite the existence of other applications on the machine which preclude frequent classic system wide checkpoints.

Some invocation of the EO is like ordering a system wide check point via the closely held checkpoint key. All of the state within the total system that has been “contaminated” by actions of the invoker is captured on stable store (or offsite) before the order finishes. Furthermore if this state is small this order will be quick. Any subsequent restart will be from a state that has all of the state visible to the invoker.

I imagine that each page and each node (the sole sites of state in Keykos) has a 32 bit contamination vector, CV, —one bit per EO. As any signal moves from a thing (page or node) to another thing, the CV of the message origin is ORed into the CV of the destination. In this regard a node fetch is regarded only as a message from the node to the domain nodes and a data fetch is considered as a message from the consulted nodes and page of the memory tree to the nodes of the domain. Clearly some optimization is required but I postpone those considerations.


I recommend “ Finding Consistent Global Checkpoints in a Distributed Computation”. For a more careful introduction to these ideas. In place of their “processes” map our pages and nodes. Much of their apparatus is unneeded in Keykos for the kernel is free to create “checkpoints” for pages and nodes between arbitrary “messages”. This is in part due to the fact that the Keykos semantics for the sending and receiving of messages is atomic, with well defined states before and after the message. This paper also reminded me of the theoretical problem of exporting checkpoint logic from the kernel. Thats not easy just now.
Consider if these ideas overlap the techniques posited in the note on isolation.