UP
See (new-ideas,03) about
related ideas.
Real computers experience hardware and software failures.
The checkpoint-restart facility is a method Gnosis uses
to recover from failures and scheduled down time.
From time to time, Gnosis takes {makes} a system-wide checkpoint.
This is a record of the state of all pages and nodes including
the information as to which nodes have processes in them.
At one instant there are two {sometimes three} complete
states of the system available: the current state and the
state at the last checkpoint. In normal operation the checkpoint
states are not referred to. The current state is represented,
in part, by information in volatile memory {e.g., in machine
registers}. The checkpointed state is represented only on
disk.
The kernel makes a checkpoint on two occasions: after a certain
preallocated area of disk used to represent checkpoints
is nearly full, or when the checkpoint key {(p2,checkpoint)}
is called.
The checkpoint key will be called periodically {perhaps
every few minutes} by some domain.
{arcane}Making a checkpoint normally appears
to happen instantaneously.
The real work involved is copying, from disk to disk,
the pages and nodes that have changed since the previous
checkpoint. This work is overlapped with normal execution
and its nature is of concern only to the holder of the migrate
key and those concerned with overall system performance.
Depending on several things it may be necessary to take another
checkpoint before this above copying activity has finished.
This will cause a delay in the normal processing.
The migrate key described in (p2,migrate) must be properly
used to ensure that these checkpoints may continue to be
taken.
When Gnosis recovers from a failure, it restarts from
the state at the last checkpoint. All pages and nodes are
backed up to their state at the time of the last checkpoint.
The journal page {v.i.} is the only exception.
Nodes with processes will begin running from their previous
state. Often, they will do the same thing they did at the
time of the checkpoint.
Certain things of necessity are not backed up. These
include the real time clock, and things outside of Gnosis
such as the memory of users. I/O device keys are rescinded
upon restart.