Learning from Algol 68

This is an appendix to this mess.

Since Algol 68 was so nearly a functional language most of the conceptual apparatus is already in place; it is almost as if they planned on a fully functional language and freaked out at the last minute when the efficiency and wide spread understanding of the stack forced them to compromise and disallow those few constructs that were unnatural to the stack.

Here I delve into their model, hopefully explaining the relevant good ideas there. I use mostly modern terminology but beware slight differences. The A68 definers chose novel terminology which caused obvious problems.

Only mutable things need storage in the A68 execution model. In practice large constant arrays are supported and really require storage but addresses for those arrays are not reified in the language; that is between the compiler and the machine. In the execution model the large array exists, but not in memory! An address of mutable data of type t is reified in the model and the language as a value of type REF t. There are two primitive sources of REF t values (memory addresses to put values of some type t), HEAP t and LOC t. Each of them is an expression that evaluates to a value of type REF t which is an address of space newly allocated and big enough for values of type t. The space will remain allocated (until there are no more copies of corresponding REF t.) A REF t value from LOC t is cursed, however; it has a limited ‘scope’ that forbids its transmittal to places which last so long that the value might be used beyond the natural stack frame discipline. In effect LOC INT grabs space on the stack for an integer.

Here is some plausible A68 code:

(INT x = 42; print(x))

No storage is used; no REF values.

(INT x = 42; x := 4; print(x))

I forgot to say that the assignment operator (:=) takes a REF t on the left and a t on the right. This is illegal x is an INT not a REF INT! x is permanently 42 while it is in scope!

(INT x := 0; TO 10 DO x +:= 3 OD; print(x))

We add 3 to 0, 10 times. There is no evident REF here but we remove the syntax sugar:

(REF INT x = (LOC INT := 0); TO 10 DO x +:= 3 OD; print(x))

All value modification in storage bottoms out the assignment operator. Now we can follow the execution model:

LOC INT grabs space for an integer on the stack,
(LOC INT := 0) puts 0 there and evaluates to its left operand,
The newly bound identifier x is introduced with the permanent(!!!!) value that locates the storage that will accumulate the sum.
The +:= operator comes free with A68 and does what you expect, much like the assignment operator. The x in print(x) indeed has a type REF INT and we print only integers, so the perhaps overly convenient coercion rules ‘dereference’ the REF INT to get the INT to be printed.

The plan is now clear: do a global substitution of LOC by HEAP, after desugaring, and we are done. Ain’t quite so easy. Finding the sugar is slightly tedious, and it makes normal parts of the program run real slow.

Now, just for fun, we construct that large constant array:

([]INT x = ([1000]INT y; FOR j TO 1000 DO y[j] := j*j OD; y);
  print(x(13)))

The type [1000]INT is ‘a row of 1000 integers’ to which we never assign. ‘=’ is an identity declaration, not an assignment, in the execution model. It is a constant row of constant integers! But its value is not a change from some previous value, it is its only value. The mutable array y is used to compute the one and only value of x, but y is quickly discarded. The above speaks in the execution model. A real compiler is likely to see thru this artifice and omit one of the arrays. The careful reader may suspect that I have omitted some magic here.

An earlier language BLISS had similar notions of storage.