Defining Computer Languages

These are some remarks on styles of defining computer languages. I very much doubt I will say anything new here; I merely collect a few ideas.

The first Fortran manual defined expressions recursively in plain English. Natural languages are good at that. BNF provides a stylized form for these recursive definitions. BNF is now used to define the entire language syntax now. I think this is a slight improvement on conventional BNF.

Define the universe of values

The meaning of most computer languages is defined in part by ascribing values to expressions of the language. An expression in a program is evaluated at times during the execution of the program and the meaning of the expression is the value that it takes on. (Digression into more detail) Most ordinary languages support a class of values that can be defined recursively. The set of values can be defined by something much like BNF. The terminal productions for the syntax BNF for the expressions of the language are explained as the terminal productions for the BNF of the values of the language. Freiburghouse’s PL/I Manual for Multics also provided a recursive definition of the possible meanings of an identifier declaration which went some ways beyond the types it could be bound to. Such meanings were not closely related to the syntax of the declarations proper, nor were they isomorphic with the values present in a running program.

Scheme

Scheme’s definition is an admirable language definition but not really along these lines. Here are fragments of a definition along such lines.

Scheme sort of cheats in that the language and values of expressions are from the same space — S-expressions. We thus need just one BNF for both which makes this exercise shorter but less representative. Strangely the Scheme Report fails to take advantage of this.

An S-expression is

a symbol,
nil,
a number,
an ordered pair of S-expressions,
a vector of S-expressions which is an ordered set of some specific number of S-expressions,
a character,
a string.

There are many unspecified symbols and numbers. For each ASCII character there is a character.

Note that we have given no clue about how to type the external representation of an S-expression.

The value of an S-expression is either another S-expression or a procedure. A procedure consists of an S-expression and an environ.

Language definitions generally describe some fictitious process that

is designed to be easy to describe and understand,
yields equivalent results.

Often there are compile-time and run-time phases in this fictitious process that may match a real implementation. Lexical scope languages conventionally bind defining occurrences to applied occurrences of identifiers before the program starts. If the fictitious process follows this practice then it is clear to the student of the language that there is no run-time binding. The JavaScript definition describes this binding as a run time process. I think that indeed the JavaScript interpreters do it dynamically. I think that there are security holes in that very process.

This too