Garbage Collection

When the logic of the evaluator uses the platform’s stack to track subgoals of the evaluator the following problems arise:

Some of the fundamental logic is obscured. This is normally good but in our exercise it hides mechanism we may want to reveal.
There are two fixed allocations of non-interchangeable memory: stack and heap, or in our case the work array.
The current simple eval hides values on the stack that any garbage collection must see.
and, of course, tail recursion.

We consider here a scheme that eschews recursive C routines. This means that we explicitly build continuations as linked lists thru the work space. We also introduce the traditional linked free list and with this we can explicitly free work space as we return from evaluations.

Soft Stack

The simple interpreter uses the C stack as the stack of sub-goals for an evaluation. We explore ideas here to transform this to a plainer mechanism. This will allow proper detection of storage exhaustion and simpler progress toward garbage collection. I do not like conservative garbage collection. This code seems to work. This code required a careful set of types. The types will govern both the run-time and its GC. Writing this code makes me thankful for normal stack logic as supplied by languages such as C, especially for recursive routines. I find a.c fairly clear and b.c very opaque. The latter is much like machine language for a machine without a stack. Even C’s normal block structure discipline is violated by gotos into blocks.

At the top of the program, top, we have the following situation:

The value in x is an Mcode pointer to a compiled expression whose value is wanted by the continuation in c.
The free variables in the expression have values indicated by the environ in e.

A continuation identifies code to begin obeying when the wanted value has been placed in the variable v of type val.

When a subexpression has been evaluated control reaches the code at if(c) whereupon v must hold the value and variables e and x are dead. We read c and go to the identified code that wanted the value. Continuations are allocated and freed by the code logic that must ‘invoke’ eval recursively. The value c will be the same as upon the matching transfer to top. The application of a function (appl) to an argument is presumably the only such case.

A free list of env frames will be kept in order to reclaim continuation space. The type env is as of this point in the design a triple pun:

It has the original function of chaining the parts of an environ.
It has the function of chaining unallocated frames.
It chains continuations.

In order to avoid the cost of initially chaining all free storage together we continue the simple convention that all storage between index 0 and wp is free together with those elements on the free list freeL.

Types for Run-Time

Here is a BNF for the data of the running λ program. This is a psychoanalysis of the stack form of the evaluator. This syntax produces trees and not strings. The following information does not portray what tags are needed or used, but it should help determine that. Nor does it determine where the pointers are that must break every loop in the BNF, but it should help determine these too. The following (leaf) categories are terminal for the purposes here:

<prim fun>,  <integer>, <Mcode pointer>, Null

The productions:

<value>      ::= <prim fun> | <λ function> | <integer>
<λ function> ::= (<Mcode pointer> <environ>)
<environ>    ::= (<value> <environ>) | Null
<Fcont>      ::= (<Mcode pointer> <environ> <cont>)
<Acont>      ::= (<value> <environ> <cont>)
<cont>       ::= <Acont> | <Fcont> | Null

Some explanations are needed:

<value>: merely the internal coding of a conceptual λ calculus value.
<λ function>: a function produced by evaluation an expression beginning: “(λ”
<environ>: The chained structure that provides values for free variables.
<Fcont>: The structure that remembers why the evaluation of the left side of an appl was requested.
<Acont>: The structure that remembers why the evaluation of the right side of an appl was requested. Its <value> is the function to be applied. Its <environ> is that of the original appl expression. Its <cont> is the continuation of the original appl expression.
<prim fun>: The implementation of a primitive function which is built into the language and not defined by a λ expression.
<integer>: The internal form of an integer.
<Mcode pointer>: Merely a locater of compiled Mcode.
Null: An empty value.

There was an <Econt> construct in an earlier design which remembered why the evaluation of the body of the applied function was requested during the evaluation of an appl. It served ultimately only to locate the next older continuation and tail recursion eliminated this storage.

A pointer must intervene in the <cont> loop; I presume that a continuation contains another by reference (pointer). That would seem to require that the representation of a continuation, <cont>, include an explicit tag. Likewise and <environ> includes an <environ>, presumably by reference. The two ‘umbrella’ types are <value> and <cont> and I think that every expression in the run-time is definitely one or the other. I believe that the GC can be coded knowing this also and thus no tag is needed to distinguish between these two types.

The system state proposed above should hold strictly at least as GC begins and ends.

Ramifications to C Types

The array work, which supplies space for the computation, is typed by a union of <value> and <cont> and any pointer into this array will, by pointer context, know which sort it is.

Garbage collection will need its own state field in each of these.

confusion

I wrote the text below the rule below when I was confused about how the working version of b.c worked. I think I answer my questions below but sometimes I slip back into confusion and so I keep the text including the erroneous attempt to improve a.c.

There is currently a discrepancy between the abstract definition of <Econt> above and the corresponding type Econt in the program that I cannot now make fail. Econt is an empty struct! I verbalize the logic here and try to argue that the program is wrong.

To evaluate (<Expression>₁<Expression>₂) when <Expression>₁ yields a λ function, as distinct from a numeric operator or a number, we evaluate the body of the λ function in a temporary extended environ. By ‘temporary’ I mean that following this last evaluation we return the environ to the free list in the following line of code

{cnt* t=c; pw(c->c.a.v); c=c->C; pc(t);}

which has the 2nd and last invocation of pw. My understanding is that an environ is part of the permanent yield of an expression (λ<ident><Expression>). I suppose that the current code works because the temporary environ needs only to supply a value for the free variables of the body. The environ then serves no further purpose. If this is so it seems there might be an optimization. An expression that embodies this dilemma is:

The resolution to this confusion, I think, is that the temporary environ supplies the value bound to a free variable for the duration of the evaluation, after which the environ is superfluous. This explanation thwarts tail recursion for we must come back to free the environ.

Q: Now I must understand why I cannot make the same change to a.c! This fails.
A: b.c frees the space for the pointer (a -> a.u.v) to the value whereas ar.c frees the value! For b.c, the space for the value is allocated by the code env * ee = gw(); and that space is not released until gc might do so.

end confusion

A compiler must avoid tail recursion for the following program:

void x(int *);
void r(){int q = ...;
  x(&q);}

Further thoughts here.