Documentation Theory

Thoughts on Documentation

I am often critical of current standards of software documentation and even hardware documentation. I try to recollect here how I came to some of my unpopular opinions.

I had few enunciated theories of documentation for software when I was in charge of operating systems for Tymshare, except the idea that one should be able to find in one place, information on how to write code for one of our systems. We did a pretty good job I think. We embarked on a 370 project which required precise knowledge of the 370 hardware system. I got the IBM 370 Principles of Operation (POO) and studied it closely for the first time. Gradually I came to know how to find information that I needed from that manual and also came to realize that that information was complete in a way that I had not seen before. It seemed to require of the reader as background, only elementary knowledge of English and some certain mathematical maturity, though the specific mathematical knowledge did not extend beyond arithmetic. There was no assumption that the reader had already mastered several machines and that the job at hand was merely saying how this particular machine worked. See this too.

The POO arose when IBM embarked on an unprecedented venture of producing a family of specific designs for machines that would appear identical to programs including operating systems. Up to that time it had been considered that only two machines of the same model from the same assembly line might indeed appear to be identical. Some still remembered when the same machine would not work the same from one day to the next. Perhaps an engineer had modified it to improve its function. Perhaps a component had merely failed.

To this end IBM produced the POO with such clarity that the several projects to build family members would interpret it in the same way. I suppose that extensive diagnostics were also written to test for compliance. The main magic in the scheme, I think, were editors with an uncanny knack for detecting ambiguous language. Chief among these, I think, was Andris Padegs.

It seems that a collective passion for precision developed among the architects who were aware of the fears that the machines would be incompatible.

Early family members spanned about two orders of magnitude in performance and only a little bit of latitude was left to the builders of specific family members. This latitude was spelled out very clearly in the POO however. There were two areas where divergence was not well documented (1) the IO and (2) the machine maintenance instructions. The latter involved access to the underlying implementation hardware which indeed differed considerably within the family. The problems with IO are more difficult to address but I have a few ideas which I will not present here.

We attempted to emulate this documentation style for Keykos. Indeed we conceived most of the kernel function as an extension of the hardware.

Philosophical

I take away from this a motto:

Clarity precedes Accuracy!

for there is no hope for Accuracy without Clarity.

Yet what is clarity? A good friend of mine who seems to be the local source of how the Java libraries work, in contrast to the documentation, seems to have acquired his knowledge from reading the Java source code for those libraries. I suggest that he internalizes what he learns in some unnamed form. The alternative would seem to be remembering the source verbatim. This is absurd for it denies the very purpose of the excellent abstraction that the Java language provides. If you had to consider the details of all the routines you called to understand your code, none of todays interesting applications would be possible.

If users of compilers had to read the source code of the compilers they used, there would be few programmers. We have done a tolerable job of describing computer languages, but not the Java libraries. Without clarity documentation is not even wrong as Pauli said of some ill-expressed physics theory.

I postulate that my friend internalizes the library function is a form close to how the POO was framed. I share the controversial opinion that natural language is closely parallel to natural thought process and what can be thought can be said, although some thoughts may not yet have corresponding jargon familiar to the thinker, or shared with his colleagues.

More specifically I think that the Java library documentation is universally hazy about which objects bear what state. I need an abstract understanding of the state of objects that I employ. There is systematic avoidance of these issues in the documentation. The abstract state need not correspond to the real state but the software must act as if it did. It is always clear from the POO where state is in a 370. I am not sure whether this is the main problem with the Java documentation.

These ideas seem opposed by these.