Thesis: the Scheme construct (set! (string->symbol ...) ...) is almost as bad as C’s construct (int *)(random()) = 0;

Scheme constructs with “!” are annoying, yet seem occasionally vital. I tried to write an efficient simulation of a mutable array based upon immutable arrays. There was a logarithmic overhead with a bad constant coefficient.

I think that OCaml addresses this problem better than either Scheme or my proposal below. Perhaps OCaml’s compromise is better engineering. The pattern below is consonant the Keykos solution where it works well. The pattern is perhaps not so good for a language.

Mutable Scheme things are said to denote or be bound to locations. To evaluate a variable conceptually you first determine the location to which it is bound and then fetch what is currently stored in that location.

I explore here ramifications of binding a variable directly to a value instead of to a location. In the process replace “variable” with “identifier”. In official Scheme (lambda (x y) ...) binds x and y to new locations and then puts the arguments in those locations. Rees has proposed a new construct, the cell which, like the current location is mutable. In this scheme a cell is a first class value in its own right. I propose to go farther than Rees. Here is a table of before and after situations.
R5RSWhatever
location is not a typeCell is a type: (cell? (make-cell 3)) => #t
Lambda binds variable to new location which initially holds the argument. Lambda binds identifier to argument.
set! is Syntaxset! is a Standard procedure
Locations cannot be passed in messagesCells, being fist class, can be arguments.
A pair has two locations withinA pair is immutable and has two values, one or both of which may be a cell, of course.
A vector or string has a location per element.Vectors of cells are practically necessary.
Do we need immutable vectors? Another type??
If “x” is a variable then the expression “x” evaluates to the value currently held in the location to which x is bound. If “x” is a variable (bound to a cell) then fetching the contents of that cell must be explicit or involve syntactic sugar not characteristic of Scheme. I tentatively propose “.x” as short for Rees’s (cell-ref x). This is from Bliss.
If you need to mutate a parameter you can just do it:
(lambda (x) ... (set! x 42) ... x ...).
You must plan a bit in order to mutate a parameter:
(lambda (y) (let ((x (new-cell y))) ... (set! x 42) ... .x ...)
The definition of letrec refers to locations. Alternatively one could speak more abstractly by saying

The <init>s are evaluated in some order and then bound to the respective variables. It is an error to refer to such a variable before it is bound.
It would then be better to replace “<init>” by “expression” and also “variable” by “identifier”, as they can no longer vary. It must be made clear that the evaluation of “(lambda (x) (cons x y))” refers to neither x nor y while the evaluation of “(cons x y)” refers to both.
We might indeed do away with all talk of locations except for the cell. (set! cell obj) becomes a standard procedure that takes a cell and another value. Afterwards the cell refers to the obj. (lambda (x) (set! x 3)) is invalid unless supplied with an argument which is a cell.
Here is the nearest thing to the rub: Fetching from the cell must be explicit. At this point Algol 68 (from which I borrow heavily here) deploys its powerful compile time coercion machinery to make most references to variables convenient and familiar. Dereferencing an Algol68 variable is easy to write. In most contexts “x” evaluates to the content of the location to which x is bound. Suppose that “x” is a Scheme identifier bound to a cell. “.x” might refer to the value in the cell. That notation is from the pre-historic language called Bliss that also made locations into first class values. An assignment between Bliss variables would appear as “a = .b”. Both a and b are bound to locations that can vary. C sort of means the same thing when it writes “*x” but C ideas lead one astray and confuse! (Explain the difference between a pointer and an lvalue!! They unify in Bliss and Algol68.)

Scheme code to double a number is still (lambda (x) (+ x x)) but no locations are conceptually involved.

Vectors

The Scheme vector is clearly built from locations but the report is not explicit on this in its description of vectors. It does mention vectors in the introduction to locations. How else could one perform vector-set!? In the new world a cell-vector type may be practically needed. The dramatic efficiency of the permutation inversion routine:
(lambda (x)(let* ((n (vector-length x))(y (make-vector n)))
   (do ((j 0 (+ j 1)))((= j n) y)(vector-set! y (vector-ref x j) j))))
requires some sort of mutable arrays. It is not clear whether an immutable vector is practically required as well.

The yield of (make-vector 10 (new-cell)) will not serve as a mutable vector for “(new-cell)” is evaluated just once and we need 10 cells. We need a new vector forming standard procedure such as:
Procedure: (proc->vector k n proc)
which returns a vector with n elements, the i’th being the yield of (proc i). proc must be a procedure of one argument.

Now we can rewrite the above program:

(lambda (x)(let* ((n (vector-length x))
        (y (proc->vector n (lambda (i) (new-cell)))))
   (do ((j 0 (+ j 1)))((= j n) y)(set! (vector-ref y .(vector-ref x j)) j))))
The above is obscure and incompatible with some of the rationale of the original Scheme vector. The initialization of the new vector: (proc->vector n (lambda (i) (new-cell))) seems to allocate storage locations for cells obliviously to the storage for the array. Enough hackery in the interpreter could ameliorate this but this makes it more difficult for the programmer to manage the efficiency of his program. Algol68 supports just such constructs as “array of references to int”, but such constructs are seldom useful. An Algol68 programmer would instead use the construct “reference to array of ints” which is just Scheme’s vector, except the “int” is not required in Scheme.

I see no computational requirement for a Scheme vector beyond a vector of locations, which is the current Scheme vector. Pedagogically I would introduce the vector as a close relative to the cell.

There may however be a reason for sharing vectors between procedures that should be unable to communicate via the locations within a vector. I address this issue in a not so orthogonal proposal to add sensory access to Scheme.

Ramifications

Alas deleting the conceptual step between an identifier and the value it stands for does not speed up the real system, unless the real system was indeed naïve. It does simplify by not requiring variables to have tentative memory locations allocated to them. A compiler knows sooner that the value of an identifier is constant thruout the lifetime of an environment.

I presume that the top level environment would be immutable. Much grief stems from trying to provide the correct side effects for: (set! car cdr). Doing it correctly produces even more grief! If I understand correctly some (most?) Scheme implementations run slower due to the requirement to constantly monitor the meaning of car. I know no reason why it is good that the meaning of car can change half way thru evaluating some unrelated expression, except for remaining faithful the semantics of variables. Each separate yield of the Scheme Factory must have its own copy of locations holding all the Standard Procedures, lest instances gossip among themselves. The solution is to banish the variables, except in those few cases where they are needed.

Similar Work

Darius Bacon tells me of a variant of this idea was done at JPL by Ed Gamble.