Bell System Reset Taxonomy
Sometime in the early 70’s I heard a detailed description of the error strategy for one of the Bell System programmed exchanges.
I suspect that it for either 1ESS of 5ESS, or it might have been a policy aimed at all of their programmed ‘central offices’.
It was rather vague in detail yet I was impressed at the time by the benefits of categorizing disaster, giving events names so that you could think and communicate about what level of event you were dealing with.
Terminology (I mostly forget AT&T’s terminology):
I recall phrases such as “level 4 reset”.
- what system planners and plant personnel do to the hardware.
- Switch state
- what the software does to the switching hardware.
- Trunk line
- Communication equipment carrying many calls to other Bell central offices.
I think that there were a hierarchy of 7 levels of incident.
I recall that there were 7 levels and my memory of all but the first two and the last is vague and I fabricate here a bit.
I recall that responses by software was cumulative—any state lost at one level was necessarily lost at higher levels.
(This is incompatible with my reconstruction above.)
I think there were other useful generalities.
- Some subscriber currently dialing must recommence.
The subscriber hears a new dial tone.
This seldom affects more than one user.
- Some calls are terminated but initiating new calls is still possible.
(Expect temporary overload due to many attempts to redial.)
- The program finds that its model of the switch state is corrupted or different from the state of the real switches.
All current calls in the exchange are terminated.
To proceed would likely connect users at random.
- The program decides that it does not know how parts of the switching hardware are configured and thus that hardware is unusable.
This may be specific to certain subscriber lines even when those lines are not in use.
The computer realizes that its model of the configuration of the real switching hardware is incorrect and ceases trying to setup calls using that hardware.
This may isolate some customers pending human intervention.
- The computer realizes that its model of trunk lines is incompatible with the input from the hardware.
Trunk lines may thus become inoperable.
- The computer can barely say “I give up”, or not even say that.
A reboot is attempted.
- Reboot fails and magnet tapes are fetched by operators for a clean reinstall of the entire system including
- switching hardware configuration,
- subscriber information, and maps between hardware and subscriber,
- trunk configuration.
I think that we heard this description sometime around the time of the infamous Chicago outage which was the first ever level 7 reset while in service.
Some part of the City was out for a few hours.