One goal of RamPlan is to support type safety proofs for programs similar to what a Java byte code verifiers does for interpreted Java programs. The hope is that C and C++ compilers might produce verifiable programs, sometimes with extra help from the programmer. Such compilers might be modified to produce parts of the RamPlan and accompany the compiled code with the corresponding plans. An immediate corollary to type safety is capability security. These properties are normally a necessary step towards proving stronger things about programs.
I have spent a number of months working on the design of RamPlan and important problems remain. This note is to encourage others to add their ideas.
RamPlan is a language to be dealt with by programs much as machine language is produced mainly by programs and read by computers.
A RamPlan expresses a number of facts about the state of the memory of a running program. In this regard first order predicate calculus would subsume RamPlan and RamPlan will always be incomplete in this regard, unless perhaps first order predicate calculus is added as a wart. I anticipate that what is expressible in RamPlan will suffice for the mentioned goals in most cases and be processed more quickly by smaller programs such as checkers.
A type safety checker would look at the code and the RamPlan. It is not germane whether they occupy the same address space. I assume that the checking code would find blocks of code between which the assertions expressed in RamPlan held, and prove that each block preserved these assertions. Such blocks are approximately what compiler designers call “basic blocks”. Auxiliary assertions regarding register contents between these blocks would also be required. Procedure declarations from conventional languages would provide some of these assertions. So far these goals are nearly identical to Java’s byte code verifier.
We will, of course, insist on RamPlan being strong enough to describe its own format. We also propose that the stack be described by the plan. The vtables of C++ are a particular challenge that we intend to meet.
I have been influenced by IBM’s PL/I system’s “dope vectors” which contain information not available at compile time regarding array sizes and such. The dope vectors are run time data produced and consumed by the compiled code. The compiler also passes on type information available at compile time to aid a diagnostic library to report values in a format convenient for the programmer.
We start out mimicking type theories for conventional “higher level languages”, indeed we will follow C as that language already descends to near machine level.
I will not assign specific values just yet, but they can be assigned early. There will be types for ints of length 8, 16, 32, 64. We do not distinguish signed from unsigned. There will be three or four floating lengths.
We shall probably need to address bit fields that are not aligned on byte boundaries, but not yet.
The plan for an n element structure is a pointer to successive words holding 0, n, and the n respective field plans. (0 is a code for structure.)
The plan for an array of n elements is a pointer to successive words holding 1, n, and a plan for the array element.
Amusingly the plan for an array is a struct and the plan for a struct is an array.
The plan for a pointer is merely a pointer to two words holding 2 and the plan to the located data. The plan for a pointer guaranteed not to be null (0) has code 3 instead.
This unnamed language is a bit like C especially in its operators. It only uses operators that yield values and are without side effects. In dispenses with explicit structures and the special rules of pointer arithmetic. Integers can be dereferenced.
The imported C operators are infix +, -, *, /, %, <<, >>, &, |, ^, <, <=, >, >=, &&, || and prefix +, -, !, *, and perhaps conditional expressions. We will use “=” or “==” for an equality comparison. We might say that all values are 32 bits, or perhaps 64. We might have distinct dereferencing operators for different memory sizes. Perhaps operators *1, *2, *4, *8 in place of *. We usurp “->” for implication as we have no built-in theory of structure fields. “a -> b” means “!a || b” and has the least precedence. We retain the other precedence rules of C.
Such expressions do not describe execution but a value, often a boolean.
To describe a linked list of integers less than 11 and terminated with a zero pointer one might write:
LL11(z) -> !z || *z < 11 && LL11(*(z+4));
If we are to form arguments that multiple processors running in one address space preserve these invariants, we will probably need to support some theory of locks, although this theory may not be necessarily built in.
Perhaps a useful source of insight is the Macintosh programming scheme where Handles are meant to provide unique locators for blocks of application data. Code is supposed to go thru the handle each time to access the data and at the same time the data can be moved on short notice since its address is known at only one place. Of course the nature of the instruction set requires that the real data address be copied into registers in order to use the data. The issue then arises how often the code must reload the locator. Ultimately Apple had to publish a long list of routines that were safe to call without reloading the handle. Adherence to these rules seems to have been entirely manual and error prone. It was also burdensome for for Apple to avoid extending this list. Complex applications had to extend this list with their own routines. Digitec compilers of the 60’s used a similar scheme internal to their compilers.
The only answer I know to these problems for our context is attributes of holders of pointers that declare their content to be unique. It is more complex than that for if x is a unique pointer to a structure then x+4 is not x but its existence must preclude considering x from being unique.