In conventional systems it is commonplace to restart the system when some apparently transient event has corrupted its state. Whether such events are indeed software bugs or in fact transient hardware errors, restarting the system recovers from the great majority of such events. With a persistent system there is a danger that the checkpoint is corrupted. The kernel maintains many invariants in the computer science sense of the word. These are documented in the kernel logic manual and also in kernel code called “CHECK” which runs just before each checkpoint is taken. This usually provides an opportunity to recover an uncorrupted state in those cases where the transient error occurred while in privileged mode. Certain very critical user mode programs, such as the space bank, have run for several years without suffering a fault. The bank is not especially large and seems to be bug free. The kernel has run for about a year between crashes. The IBM/370 series seemed to be somewhat more reliable yet.
See this about more ramifications of persistence and this about the time warp at the interface to the real world. Other notes on persistence