I imagine here a strategy to use the conventional stack for conventional patterns and use the slower more general alternatives only as necessary. My claim is that the compiler can efficiently tell when the fancy stuff is needed. I have not yet proven that claim, however. See some more A68 detail.
This is more about the nature of the compiled code and machine representation of values, than the structure of the compiler. This is a paper by nearly the same title as this page. That paper describes a uniform way to compile functional languages. I won’t repeat much of that. The tricks I explore here presume to compile conventional program patterns conventionally and resort to the tricks of the paper above only in the cases once considered exotic and excluded for thinking them not useful, or for difficulty in implementation. I want to consider a system that mainly uses the hardware that is conventionally used for the stack, and do novel things only when necessary.
First class functions are insinuating themselves into ordinary languages; there is even an effort to add some sort of lambda construct to Java. Here are three language properties, largely orthogonal, that describe the languages I am concerned with:
Apple has a clever new addition to C, the block, that provides function like objects that survive the stack frame where they were created. The life time of a block is explicitly managed like other values in C, C++ or Objective C. Here, however, I implicitly assume garbage collection and syntactic lambda constructs such as the above.
While C lacks lambda expressions, the issues apply equally to C and so I will use C, and gcc’s nested function extensions to illustrate the issues. gcc provides neither persistent functions, nor anonymous functions, but it does provide access to free variables. The syntax is merely marginally more awkward.
Before we proceed there is an issue to be settled. With recursive routines several stack frames may exist at once for the same function. It is not entirely obvious which frame a free variable should refer to. I will not go into the confusion in detail here but suffice it to say that Church got it right and all the current functional languages that I know of all do it Church’s way. The Algol 68 language definition, for all its faults, introduced the best terminology and imagery in my mind to make clear what frame is meant by a free variable in a function body. That imagery also generally corresponds to the most efficient way, I think, of implementing the necessary semantics. Algol 68 refers to the environ (short for ‘environment’) that is a conceptual runtime construct that defines the current values of all the free variables. Every evaluation of a language expression, including lambdas, is evaluated against some environ. Environs come and go as the computation proceeds and several may exist at once.
In simple languages the stack reifies the environ and we want to use the stack too when we can. The reification of a function is a pair of words, the first points to the code compiled from the body of the function definition; the second points to the reification of the environ.
The environ packages up pointers to all of the stack frames that were accessible as the lambda expression was evaluated to produce the function. This package and a pointer to the code compiled from the body of the function definition is the machine representation of the function. Actually Algol 68 goes one step further and hints that the environ need only package pointers to the frames ‘necessary’ to the function body.
There are common circumstances where the code compiled from the function body can refer directly to a free variable in its parent’s stack frame. Consider:
double ps(int j, double a[j], double b[j]){ double ip(double x[j], double y[j]) {int k; double s=0; for(k=0; k<j; ++k) s += x[j]*y[k]; return s;} return ip(a, b)/(ip(a, a)*ip(b, b));}‘j’ is free in the inner function ip but defined as a parameter to ps. The inner function ip can access j on the stack just as the outer function could using the stack pointer.
This pattern happens when a function f defined within a function g, is invoked during the same invocation of g as f is invoked. (SIC) I will try to say that more clearly. The text that defines f is included in the text that defines g. g is invoked and during that invocation, f is created and invoked. f is not invoked subsequent to the end of the invocation of g. This useful program, described here has routines 3 deep with several instances of free variables at the bottom.
First class values, such as functions here, can be the value of an expression, of a variable, an argument, or of a parameter. These arrangements allow the function to get to other places where the free variables are not in scope, and other times where the stack frame where the variable once lived has been reallocated. Here is a program written in four languages which exploits a convenient hack to instantiate two generators to be run concurrently. The two instances keep their state in a contrived stack frame that persists during the execution. Hacks like this encourage object oriented languages.
The ‘Alternative’ in the Scheme program avoids the hack and provides a routine, mi that generates generators, two of which are generated to do the computation. That inner anonymous function definition has the free variable a which is defined on the scope of mi. mi returns that function each time so that a refers to different initialized storage—we exploit Scheme’s persistent functions.
There comes a time during compilation when the defining occurrences of variables and the corresponding applied occurrences have been identified. At that point in should be possible to decide for each applied occurrence whether it is possible for the address to escape. There are three possible outcomes:
Algol 68 provides specific static rules, enforced by some compilers that prevent such escapes where the function is remembered beyond the duration of the invocation where it was produced, at the expense sometimes of preventing safe usage. If a compiler were to resort to allocating such offending variables on the heap, then we would be done. Distinguishing between the two cases above would be an extra benefit.
One more thing: We have not said how the code compiled from the function body with the free variables gets access to them. This has been a task since Algol 60. Such a body may have free variables accessing storage in several different stack frames. Those frames will still be allocated in the Algol, PL/I, Pascal family, but their relative positions are not known at compile time. They are not accessible via the stack pointer. They are naturally chained together, however. This is where the ‘environ’ concept shines, following the Algol 68 definition suggests chaining the ‘necessary parts’ of the environ.
In the Algol 68 abstract execution model each block of code ran in an environ which determined the meaning of all the identifiers. An environ was composed of a locale and another environ. (The locale was the current frame and the other environ was the rest of the stack.) When a lambda expression was evaluated it was expressed as a pointer to the compiled code and a pointer to the environ where
C | {int x = cos(z); res = sqrt(x)+x;} |
gcc C | ({int x = cos(z); sqrt(x)+x;}) |
Scheme | (let ((x (cos z))) (+ (sqrt x) x)) |
OCaml | let x = sin z in (sqrt x) +. x |
OAlgol 68 | (REAL x = sin(z); x + sqrt(x)) |
{ref int w[10]; for(int i = 0; i<10; ++i) {int j; w[i] = &j;}}Should put 10 different addresses in the array w. Gcc and clang each put the same address in each array element. Algol 68 refuses the assignment statement quoting a clause that addresses of variables cannot be stored in places that live longer than the variable. Just such variation is demanded however if the natural syntax of conventional languages are to support functional patterns. It would be natural to produce 10 counting objects in C by writing:
typedef void C(void); typedef int R(void); typedef struct{C *up; R *rd;} ob; ob o[10]; for(int i = 0; i < 10; ++i) { int c = 0; void inc(void) {++c;}; int r(void) {return c;}; o[i] = (ob){inc, r};};This compiles and runs.
gcc cob.c -Wmost -std=c99 -fnested-functions ./a.outUnfortunately there is only one counter among the objects. The pattern works in Scheme and OCaml however.