The Explainer Pattern

Many, perhaps most, messages between applications or between computers are in some format which is defined recursively. Programs to interpret such messages tend to be organized along the lines of the format definition. Imagine a recursive descent parser of the message. The same applies to file directory structures as well as messages, or in linked data structures in RAM shared between sender and receiver.

When some sanity check within the interpreting code finds a problem it is good form and customary to issue some sort of human form of diagnostic asserting non conformance to some standard, or perhaps to some limitation of the interpreting program. Unfortunately most often such a message is phrased in the jargon used to define the format and meaning of some sub-sub-sub-component of the original message. The meaning of this message is often opaque to even an expert who understands well the meaning and purpose of the containing message. There is usually no clue to the context of this sub component, nor to the intermediate contexts.

I describe here a rather efficient and natural pattern to issue much more useful messages. The pattern relies on a language feature that originated in Algol 60, was propagated to Pascal and PL/I, lost in C, recovered in GNU C, and mangled in current OO languages, except Smalltalk (and perhaps E). I do not know how to do this naturally in C++ or Java. The feature is nested procedure definitions.

The pattern is simply to add an argument to invocations of routines to process sub-components of a component. This is the point where the recursive descent program descends. The argument is a procedure that is called only when troubles arise and an explanation in terms of an outer context is required. For this note we call this procedure the explainer. The explainer is a closure rather than the mere routine pointer available in C. The definition of the explainer is near the call site and in the necessary lexical context. The closure permits the code of the explainer natural access to the circumstances of the particular invocation responsible for the calculation now in trouble. Often the explainer code will in turn invoke older explainer code passed as a procedure in the call that created the explainer’s context. If the explainer code cannot be defined within the scope of the interpreter code responsible for the relevant outer context, then it is difficult, unnatural and expensive to arrange for the explainer code to access the particulars of the outer context.

By exploiting the logical recursive structure of natural language, explainers can often produce a complex grammatically correct description of the situation, referring at once to the several levels of abstraction. The result may, however, exceed the capacity and stack depth of the natural language processing abilities of people.

To be concrete a recursive descent compiler could easily explain that the offending syntax was found while trying to make sense of a while construct, starting on line 83 within a procedure named “expunge”.

In PL/I and Algol68 a goto statement in the explainer to a label out of the body of the explainer resets the stack to the state it was in when the context of the label was the newest context. gcc has no such feature but setjump will probably do.

Shape of the Code

This is a skeleton of q code module for some recursive interpreter. There might be several such modules in an interpreter, one per subcomponent type.

void M3(... void explain()){
   char * name = ...;
   ...
   if(...){//This M3 component has a subcomponent of type M5.
      void exp5in3(){
        printf(" an M3 component called %s is within a", name);
        explain();
        printf(" which began at offset %x within the message.", offset);}
        ...
        M5(..., exp5());
   }
   ...  
}

I use a variation of this pattern in a recursive walk of a Unix file directory tree, looking for certain troubles. Each directory causes the top routine to be reinvoked, passing in a parameter which is a seldom invoked procedure, PrintPathName, to cause the printing of the entire path name of the directory apropos to the current invocation. PrintPathName is obviously recursive in that it calls an outer instance of itself for printing the path name of the enclosing directory before it prints the name of the directory for which it was invoked.