Knowing vs. Doing

2002 April

Here is a parable that I first heard as told about Steinmetz, the man who wrote the book on the engineering of AC electrical power systems. The time was about 1915.

An owner of a large electric generator called its manufacturer, GE, for whom Steinmetz then worked, for repair of the generator. Steinmetz arrived studied the situation for 15 min and then applied one blow to the generator after which it returned to correct operation.

Steinmetz presented a $4000 bill. The customer was aghast at paying $4000 for 15 minutes of work. Steinmetz replied that the service was knowing what to do, not in doing it.

There are many ways in which the parable is absurd but the point of the parable has never been more apropos than today and the cost of knowing how to make Unix work. Certainly Unix is not the only such technology with its own priesthood. Other operating systems each have their own supporters, not to mention each of the several major

and dozens more.

While each of these seem to be the epitome of automation, they each require a staff of Knowers as distinct from Doers. Architects of these systems have long worked at reducing the number of things the people have to do to keep the system going smoothly, but in that very process have tended to increase the number of things that people have to know to keep the systems going. Very often the latter increase is excused as an effort to achieve the former decrease.

Traditionally this tradeoff, in favor of knowing more and doing less, has been favorable in reducing the cost of running these systems. I think it has gone too far in some cases and that the designers have not noticed. As they administer systems they are aware of how much they do, but not of how much they must know to do it.

The main symptom appears when you must hire several people not to do things, but to know how to solve problems that may seldom arise.

A microcosm of this effect is illustrated in the history of maintaining computers thru the decades of the 50’s 60’s and 70’s. I was there and remember it well.

In the 50’s machines cost several million dollars and were delivered bundled with engineers to keep them running. The IBM 701 (1954) was a simple enough machine that the engineer would seldom need to consult the hardware wiring diagram to fix the machine. The problem was likely in one of the highly replicated parts of the machine and simple diagnostics would lead the clever engineer directly to the component that needed replacing.

Skipping over several generations of machines where the problem gradually changed we come to the Stretch (IBM 7030) with 250,000 transistors. The greater reliability of the transistor was about matched to the greater component count and the system was only marginally more reliable. I recall watching the engineers seeking to fix a problem. They almost always consulted the wiring diagrams in their quest. This was nearly unheard of for the 701.

The intellectual tools necessary to fix the machine had overflowed the heads of the maintenance engineer and had come to include the paper wiring diagrams. The necessary talent had shifted to the ability to learn the logic of subsystems within the machine that had never failed before in the experience of the engineer. The situation was exacerbated by the fact that the machine was down while this learning was going on.

Leaky Software Abstractions

Skipping ahead several more machine generations we come to a situation where the hardware failure problem is largely solved but the software situation is worse than ever: The machines don’t fail and their written external specification is sufficiently clear so that you do not need a suite of engineers whose sole job is to know how the machine does what it does.

Jon Udell’s Lizard Brain Surgery notes that in most software systems, abstractions leak and that results in the user having to understand and descend to lower levels to fix things. The cost of doing is typically decreased, even as the cost of knowing increases. CPU’s have fairly simple external specifications that grow much more slowly than internal complexity. Hardware designers have learned to specify hardware so that the abstractions don’t leak. I think that software could too if they really tried. Keykos really tries to make its abstractions as solid as the CPU hardware abstractions. There is perhaps too much of an Iron Man mentality suggesting that any system manager really should be adept at fixing things at the lowest level.

Some software does well at abstraction. It has been a long time since I had to descend to machine language to get a compiled program to run. Compilers insulate me from knowing machine language as well as from writing it. The first Fortran had enough bugs at first release that patching the machine code was necessary. Since I already knew machine language well, this did not deter me. Patching machine code avoided an expensive recompile when the bug was in the Fortran source, and was the only choice when the bug was in the compiler. Using Fortran saved me much time even though I knew machine language well. I was indeed astonished when a year later people who knew no machine language were able to write and debug programs.

I think that the abstractions of compilers work well because the definition of computer language semantics has attracted close attention comparable to the definition of CPU semantics. The definition of what it means to open a file lags behind.