Help from the Compiler

Here are some notes on my wish list for tools to help understand OPP (Other People’s Programs). Perhaps we should try to entice some compiler people into the effort.

Years ago I heard a comment from some compiler person at IBM—perhaps it was Fran Allen:
This is how I recall it:

There is scarcely anything of interest to a compiler that is not potentially of great interest to the programmer as well. Indeed the discipline of independent compilation makes many things unavailable to both the compiler and programmer, that would be useful to both.

A corollary of this includes the run-time debugger as a threesome. In this case the paucity of loader semantics deprives the debugger of what would often be sufficient to debug a highly optimized program. (The compiler has optimized away some variable that it necessarily knew how to materialize, and knew in some sense that a debugger could easily understand.)

Today a large program makes its way to the RAM locations where it will be executed thru a tree of linker operations preceded by compilations and likely preceded by textual preprocessing. Each of these steps builds useful information and then discards it. I don’t propose producing textual reports to be read by people, but merely leaving this intermediate data where it can be accessed, object wise, by some of the very code that consulted it when it was new.

Persistent objects are available only in language platforms that are not typical of languages in which preprocessors, compilers and linkers are written. Such platforms currently impose various sorts of costs. Even Keykos, my favorite, could accomplish this only with substantial revamping of the tools.

A Fantasy

The preprocessor has produced a declaration of some variable from source that did not formally look like a declaration. Programs that look at source, such as etags and cscope will not see a declaration. The compiler sees a declaration and builds a structure for it. Several applied occurrences occur, some possibly produced by the preprocessor, and the compiler responds correctly to each of these, perhaps leaving additional notes behind for debugging. Each of these notes head a trail of crumbs that allow reconstruction of the state of the preprocessing where the declaration was produced and also the applied occurrences.

Some will object that the above patterns for declarations should be avoided. I agree but sometimes the alternatives are worse.

An Alternative

Another general scheme that leads to the same benefits is to keep enough information that the software can be rebuilt. This is part of a general plan to keep the source in escrow while objects obeying the machine code still exist. This way the source can be rebuilt with traps set to observe the state of the build when the code in question is built, or when the preprocessor comes to the point of generating the declaration noted by the compiler. This is a form of “running backwards” which is feasible when processing steps are generated by programs such as make.

It is not clear to me which of these two schemes is more complex. It is also not clear which is more expensive in various resources. The latter scheme requires less space but more processing.

The General Explainer Pattern

There is a style of programs that fits a particular kind of application in which the program is in a position to justify its output and this justification may be fairly short. Hydrodynamics is not such an application for the link from cause to effect, as tracked by the calculation, may involve many billions of computations. Compiling, however, is such an application if the compiler need not explain why it chose certain registers as it did.