Booting Computers & Networks

There comes a time when whoever operates a computer wishes he had started another program instead of the one that is running. How to make a computer stop executing the current program and make it start another, that is not even in memory yet. The same problem arises when power is first turned on. Today the problem is called “booting the machine”.

Before ROM was commonplace there was a hardware function that was directly invoked by the operator that caused the computer to cease operation, read some data from some conventional I/O device, and then begin executing that data as code. Machines with ROM come with ROM code that is prepared to take control making no assumptions about the state of the system beyond a few spelled out in the CPU specifications under “non maskable interrupt”. A hard reset or power on causes the machine to execute this code, abandoning whatever it might have been doing.

Whatever the underlying hardware function, there are operating systems that presume to reduce the need to reboot the system to nearly zero. A debugged and properly designed kernel can do this except for power up and transient hardware failures.

The Computer Network

Local Networks

There may be tricks here that are useful. When Seymour Cray’s 6600 was booted, each of the 10 PPU’s ceased operation and all but the first began to wait for input on a distinct hardware data stream (called a channel) available to each of the processors. The first PPU executed a 12 word program held in operator settable switches. When the first PPU had been properly rebooted it could feed initial programs to each of the waiting PPUs over the shared data streams. If a PPU program failed it remained unavailable until the whole system was rebooted.

Multi-processor clusters have a variety of solutions. In an array of computers where each computer can talk only to neighbors, some sort of pecking order must be established lest all the programs be trusted all of the time not to reboot the whole system.

Distributed Networks

Sometimes distributed computer networks are programmed and operated centrally and no operator is in a position to push the boot buttons. We cannot afford extra comm gear to reboot. I think that two situations must be maintained:

No unsigned code is executed in the remote system.
The system always be responsive to a signed request to load new code.

???

Let me take another run at this. I think that the pattern used to remotely signal a reboot must be interpreted by what I call boot code here. Boot code’s integrity requires that it is not modified, and is always in place to examine each incoming message for the reboot signal. I see two ways of insuring this

Trust each successive signed software version to maintain the rule for booting the next version.
Use classic two state features of most modern micro processors to guard the code that keeps the rule, despite erroneous or malicious signed code.

In either case the boot code must take control upon packet arrival interrupt and include any error control necessary for the reboot logic. It must also include authentication logic.

How do we enforce a rule that

I have heard rumors that certain ethernet cards are wired to reboot the machine upon receipt of magic packets. This is a bad idea.

Tourbles with Booting