Several years ago we started to design a format for storing information within a computer memory that was to capture the plan for how state of some ordinary computation was coded in memory.

As a program runs it maintains in memory, a mutable state by executing instructions that reference memory. There are properties of that state representation that were invented by the programmer, probably a person, that are necessary to preserve. Any proof of correctness for the program, formal or otherwise, must almost certainly deal with these properties. Most techniques for dealing with program correctness require some explicit representation of the properties. We thought that it might be useful to specify a format for coding this meta information without first specifying in detail how it might be used. We call such meta information a plan for the data of the program.

I must say up front that the project did not succeed but I record here some of the ideas that still seem good to me. Perhaps others may build on some of these ideas.

I will list at least one bias for the project that some will find strange. We considered a program to be those binary instructions obeyed by some processor. This is in contrast with the source form of a program that is often the focus of correctness disciplines. If data plans are to be required then compilers must produce and include them with their output. This is no small task but for more recent languages the compilers have much of the information already at hand. Compilers which compile from type safe languages have the rudiments of such a plan already provided in the program source. For more permissive languages the remaining work is all in the direction of limiting the code to what it suggested in the declarations that it would limit itself to. Sometimes the gap can be filled in by the programmer including extra information explaining how some permitted violation of type rules is in fact benign. Ideally such plans are not dependent on source language and can thus deal with programs expressed in several languages. I do not expect, however, that any scheme that we invent today will cover all future methods of programming. In we had invented this in 1955 it would not have countenanced linked lists.

Just as complex data types are composed of collections of simpler types, so are their plans composed of plans for the corresponding simpler types. The mode of composition is subtly different, however. We begin by describing plans for simple data types and build up from there to more complex types.

There are but a few primitive data types and these can be distinguished in a byte, perhaps a nibble.

An array plan is merely a struct of a pointer to the type and two index limits.

A structure plan is an array of plans, one for each array field.

Note that an array plan is a structure and a structure plan is an array. Bizarre!


Precursors

The first time I saw some of these ideas was in IBM’s PL/I system where two kinds of information accompanied or were produced by the compiled code:

Type information
which was provided by the compiler for library routines that implemented language defined procedures such as I/O. Library routines would consult such information to print complex structures without the programmer having to write complex print routines. For data referenced at call sites to these libraries, pointers to such compile time metadata were included was provided by the compiler and accompanied the code. This data is constant between compilations.
The dope vector
was dynamic information about a type that varied as the program ran. This was mainly array bounds and origin. An array of five numbers was of the same type as an array of seven numbers but when such an array was passed as an argument the routine demanded to know the length which was potentially dynamic.
This is a similar project.