The first time a Gnosis system is started, a very limited, pre-defined set of processes is initiated. {See (p1,bb).} All of the memory of the system is available in a single pool, or a set of pools, as determined by the system administrator. The process of creating an environment suitable for application development or execution is performed by the pre-defined processes, according to rules specified by the system administrator. The bootstrapping process is rather self-contained, and is likely to be customized at some sites. The customization process is currently performed by modifying some assembly language programs.
Once the system has completed the bootstrapping process, it will continue to run continuously {at least to the programs inside it}. Service interruptions resulting from system hardware and software failures, as well as modifications to the hardware configuration and the kernel software, are hidden from Gnosis programs via a system-wide checkpoint and restart mechanism.
A Gnosis system will undertake a checkpoint at regular intervals in time, {currently about 5 minutes} and whenever certain system resources become depleted. The checkpoint process appears to happen instantaneously, with little or no impact on executing programs. The checkpoint process is an integral part of the Gnosis paging mechanisms, and does not introduce excessive system overhead.
If the system does crash for any reason, an automatic recovery procedure will be invoked {assuming the computer is operational} to restart all of the programs which were operational at the time of the most recent checkpoint. All of the data associated with the programs is also “restored” to its state at the time of the checkpoint, maintaining complete program/data synchronization in spite of the failure.
If it is desired to modify either the kernel or the system hardware configuration, a system checkpoint is requested, and all processes are temporarily suspended. When the checkpoint has completed, the system disk packs may be switched to the new computer system, and a new version of the kernel loaded. When the system resumes operation, each of the programs which was running will resume operation at the instruction following its position at the time of the checkpoint.
If it is desired to modify some domain program or algorithm, logic must be provided in another domain program which has the authority to make the modification. Since an object may be arbitrarily complex, it does not appear possible to provide a system-wide facility for installing such upgrades. In fact, Gnosis users may deliberately choose not to get new versions of components when the primary concern of a production application is reliability and stability.
Because of the foregoing, we anticipate some difficulties in providing upgrades for both application and system-provided programs, at least in the near term. I would suggest that until we have more experience in this area, it would be wise for application designers to provide for the possibility of “big bangs” during the lifetime of the application.
The system-wide checkpoint facility guarantees consistency between programs and data over system failures. A logical checkpoint facility, called the system journal, is provided for those applications which require synchronization with entities outside of the Gnosis system, such as users at terminals, or other computers, or printers. The journalizing facility is immune to the system-wide backup which occurs after a failure, and allows application-dependent code to guarantee the integrity of certain operations.