Mathematical Precursors to Computation

The early 20th century saw three mathematical models for general computation:

the universal Turing machine
Alonzo Church’s lambda (λ) calculus
Kleene’s general recursive functions

These were soon proven to be equivalent in expressive power. I studied each in about 1953 and even though I was eagerly anticipating real computers I was not impressed with what the mathematical theory told us about what computers could actually do; I felt I already knew. Still it was comforting to learn that there was a delineated there there.

That was then; there is now in addition the Turing tar-pit.

I did not anticipate that any of these models would be relevant to real computers. I was wrong about the λ calculus. Whereas the universal Turing machine is wildly inefficient, the modern Scheme and LISP platforms perform some useful large tasks well very much in the spirit of λ. I do not know the situation, or prospect, with general recursive functions. There are real problems for which fairly pure forms of the λ calculus, as exemplified by Scheme for instance, provide an excellent platform. McCarthy’s LISP took insight and notation from Church. Algol 60 adopted lexical scoping from the λ calculus. (I am not sure that they knew that they were doing so; lexical scoping was already implicit in a number of mathematical works.) Scheme moved LISP closer to the original λ calculus constructs and adopted its lexical scoping exclusively. Perhaps it was the first with indefinite extent procedures and lexical scoping which were implicit in the λ calculus. While recently reading Pierce’s book I realized that an even more faithful hewing to Church’s original constructs may be feasible. I begin that attempt here but more on the connection with the mathematics ideas below. I am not quite ready to claim that this is practical.

Scheme provides recursive functions in an ad-hoc manner; the define and letrec constructs allow circular definitions. I learned from Pierce’s book, once again, about the Y combinator which had seemed arcane and only theoretical when I learned about it when it was much newer and I did not grasp its potential role for real computers. In the bare λ calculus one can define cons, car and cdr instead of positing them as primitive standard functions as does Scheme. Scheme’s if and cond syntax along with values #t and #f can likewise be defined instead of assumed. All of this was known in Church’s time but I don’t know when it was first considered for real computers.

I was amazed to learn that there are two senses of defining the traditional LISP primitives in terms of pure lambdas. There is the pattern which leads to a LISP or Scheme like language which allows a routine to feel out the shape of an input parameter and adapt its behavior to the structure of the input. Such power is provided by scheme primitives such as pair? that asks if its argument is indeed a pair, and null? that asks if a list is empty. The original 1925 plan, however was to require each piece of code to know the shape of parameters passed to it. The existence of the pair? primitive implies that the passed value carries information from which the answer may be derived. Without such extra information pair? cannot be implemented. It is like programming in machine language where there are no commands to test whether a word holds a floating value; the program must already know, or arrange yet additional storage to indicate the format of the data. There are likewise two sorts of plan to code LISP like data structures in pure lambdas, one rich mode with the extra information to respond to queries such as pair? and an austere mode without such extra information. I soon gave up trying to write programs in the austere mode. It requires usage patterns which are some peculiar combination of machine language programming and LISP patterns. I found a coding of the rich mode in pure lambdas but was unable get it running.

A funny thing happened on the way to the computer. Computer systems based on λ calculus can be said to understand λ expressions directly and perform what Church imagined as a purely platonic process. Church provided an informal description of how to ‘evaluate’ λ expressions, but his formal material is concerned only with equivalence of expressions. Church seemed satisfied to lay out equivalence conversion rules that people might apply to show that certain expressions were equivalent to others. He gives no formal notion of the meaning of a λ expression but his formal equivalence notion was clearly meant to relate λ expression with the same informal meaning. With λ based computer languages the λ expression became a prescription for actions leading to a value of the expression; i.e. a program.

Most of mathematics is in the ‘what-is’ cast. Here and there math provides algorithms which constitute ‘how-to’ information. Church provides reduction schemes which prescribe automatic actions which amount to an algorithm. I need to identify how his algorithms relate to what I have here.

Part of the magic of LISP was the S-expression that at once serves to express a LISP program and also the data it operates upon. The S-expression did so by describing and denoting an abstract tree-like value that can be reasoned about and has simpler logical properties than the linear textual S-expression itself; there are no issues of association or precedence for abstract trees. S-expressions were implemented using computer pointer technology which involved storing addresses of data in other data. Pointers were quite new then (1960). (I speculate that nature discovered the idea about 200,000,000 years ago.) This is a clever idea that is used in many sorts of languages and it is well exploited in the LISP family of languages. These trees give the programming language some expressive and flexible values to chew on; there is something besides functions to occupy the attention of a LISP program. Better yet they can be printed as S-expressions.

This is a nice intro to using the bare λ calculus as a programming language. Here is some Scheme code that closely follows the early exposition. It is important to note that in this paper, as in Church’s original work, there are two levels of notation in use.

The λ calculus proper whose entire syntax is the three lines of simple BNF given on page 1.
Then a bit of expository syntax is introduced such as ‘≡’ which serves as a relation asserting a sort of equivalence between two expressions denoting the same value. Capital letters are allowed as abbreviations for particular λ expressions and most formulas in the text are quasi quotes of expressions in the formal syntax.

In my Scheme code the define construct is used in just those cases where capital letters are introduced in the book. In the book you are to imagine a textual substitution of the definiendum. In the Scheme paper the interpreter internalizes defined values which serve, for the purposes at hand, as surrogates for the expression. Ironically the only values that Church contemplates formally are functions and while Scheme can deal with all of these values, it cannot display them. My meta procedures rd and pr turn Scheme integers into Church integers and back. Thus at least the Chruch numbers can be seen in operation.

An implementation might adopt Church numbers but provide optimization under the hood to replace Church numbers with conventional arithmetic — all without revealing the subterfuge. This optimization would certainly include adding to optimized numbers to obtain an optimized number. A print function would be able to print at least Church numbers in conventional format.

Towards Utility

I think that even a theoretical programming language needs at least a predicative define which defines new things in terms of old things only—thus excluding recursion. This results in finite but possibly large programs. (Recursion is still possible.)

I think that a predicative define together with the pure (three line BNF) λ calculus may suffice for big software. It can be adapted to independent compilation. I have no objection to extensions that can be reduced to the primitive forms.

This requires that an expression with no free variables be replaceable with any other expression with no free variables which yields the same value. Perhaps Scheme 5 achieves this but some implementations have lost the property by including the backwards compatible one argument eval (which was in Scheme 4 but excluded from 5).

Possible Compromises

I do not object to replacing the inefficient Church numerals with native machine numbers; indeed that is necessary for most applications. (I think a numeral system with logarithmic arithmetic could be devised based on different pure λ numerals, but this would still be too slow for many applications.)
I have not seen a macro system that is modular and I am uncertain how fundamental they are to good software. I am not personally aware of a project to which they were necessary. I grant their considerable convenience.
It is occasionally awkward to do without call/cc or some such mechanism.
Arrays are a significant advantage to a class of (but not all) applications that compute indexes. There are fundamental algorithms that rely on mutable arrays and these gain a factor of log(array size) over their immutable counterparts. Ocamal provides this with minimal mutability. I also like Rees’ cell construct.

On another point: I would like to see an application that is significantly improved by dynamic-wind.

In conclusion the pure λ calculus might provide a concise, expressive moderately efficient language that supports modularity in three short lines of BNF syntax plus a few standard values (which are functions) which can be defined entirely in the language. Some accommodation to separate compilation (à la R6RS Scheme) and possibly syntactic sugar will also be necessary for big software. To do arithmetic, normally required of computers, some of these standard functions must be special cased which increases implementation complexity.

Here is a brief synopsis of Church’s original development.

If I need two bodies of code X and Y, and Y needs what X yields, then I can write ((λ (x) ((λ (y) (my code))) Y) X). My code can refer to the values yielded by X and Y by x and y respectively. I presume here that employing previous code can always be done by packaging the previous code as a λ expression, evaluating it once, and running the new code where that yield is bound to some unique identifier. Note that a fairly complex set abstraction is packaged just this way here.

Here is a site about LISP hardware.

See about Combinators.
Relate to this.
Some good stuff.