A Bone Simple Instruction Set

This is a Quixotic proposal for an ultra clean and simple virtual machine. As secondary goals it should be at least moderately efficient in space and time. There are two sub goals:

A teaching tool for ‘how computers work’,
A real language into which real compilers can compile real programs which we want to perpetuate into the far future.

The second goal presumes a solution to a separate problem of storing bit strings for such long periods. Such a project might also lead to a format for algorithms that we would send to the stars much like the signals that we seek with project SETI. The film Contact postulated reception of algorithms which we instantiated.

If memory serves me right the programming manual for the IBM 701 was about 20 pages and much of that bulk was devoted to describing the IO features that were complex for purposes of making the IO hardware cheap instead of simple. (I recall that the spec for the model 33 teletype was one page including the electrical characteristics.) The 701 computational instructions were however both simple and simply described. Arithmetic was fixed point and self modifying code was necessary due to lack of either general registers or index registers.

I would propose 2’s complement binary with data sizes being a power of 2. I think this is simpler than the arguably more convenient powers of 10. I seek simplicity here over convenience. I actually think binary wins even for convenience in capturing today’s code. I see nothing wrong with the conventional 8 bit byte addressing. Byte addressing seems simple, efficient and convenient. It will be big-endian if I decide. I see no reason not to stick to 64 bit addresses for most code today can easily adapt to that size and that should be adequate for the purposes of this project.

I would propose general registers whose contents would contribute to effective address calculation. I am queasy about full IEEE floating point for that would shoot my entire complexity budget. IEEE floating specs are rather heavier than the entire 701 manual. I might consider an abbreviated floating spec with function about like the IBM 704 which would serve most purposes. Perhaps eliminating many corner cases while retaining the plain real numbers IEEE formats can denote. I would propose some simple but deterministic floating semantics perhaps following the IBM 360 style of floating semantics that is parasitic on properties of real numbers.

Much of the software we want to preserve produces pixels and so an optional pixel output mechanism is needed. Square pixels with three fixed size color components seem simple, efficient and familiar to modern software. I suppose that input in this format is also suitable. 16 bit sound samples at some fixed, or perhaps specifiable sample frequency is another option. The latter would also serve quite a range of other analog IO. I suspect that the new touch-screen input modes, while important, are too new to consider yet.

I have no idea that such a platform as this would necessarily exclude proprietary formats, programs or data, but I wonder what 100 or 1000 year old data property might mean. I shudder.

An operand stack discipline (RPN) is an attractive plan but I don’t want to fix on that yet. Extant intermediary complier formats or extant VM byte codes may be right. There are LLVM, JVM, .net formats that probably have at least ideas to contribute. I suspect that the specs of each of those are at least an order of magnitude too big however.

I like the idea of redundant specs. The figure of merit is not the sum of the sizes of these redundant specs, but more the minimum of these sizes. The minimum is not really right, however, because some such specs will depend on complex prerequisites such as the lore of denotational semantics, the specs of some extant language, or some other obscure discipline. Obscure disciplines will reveal difficulties not evident in other disciplines. Redundant specs can achieve durability especially if redundant implementations become available against which to test applications intended for archiving.

Perhaps another option is a protection mode suitable for kernels. This might serve to support combinations of OSes and applications with incestuous relationships. I hope that most applications would run on the platform with at most a few standard libraries implemented upon the platform without features like multi-threading or any other handshaking with the kernel. I should think that stack discipline is purely an application issue and that the instruction set would support all the common stack ideas.

Don Knuth’s MIX architecture should be considered.

I purposely avoid a design that wants to be implemented in silicon, for that would soon drag the design in the direction of complexity; simplicity is the prime goal here.

The Instruction Set Architecture

There are a few issues in an ISA. The instruction format is described early but designed late. I will not start there. Another issue is naming of operands and size of register file. I am thinking RISC-like in which instructions that address memory don’t prescribe operations and conversely. We have already proposed 2’s complement binary and most machines presume a default operand size for ‘scalars’ which is perhaps overly simple, but here we want simplicity. I propose 64 bit operands.

For convenience, conventionality and efficiency, shifts of amounts from registers and from the instruction stream.
Both signed and unsigned right shifts.
Produce a bit mask with n low order 1’s; n from register or instruction.
Boolean ops and, or, not.
Unsigned add, sub, mul, div.
Mul can optionally provide high 64 bits.
Div can optionally input 128 dividend.
IEEE float +, -, *, / for ‘ordinary numbers’, round to nearest, or even.
Load and store instructions include a modest offset within instruction to be added to the contents of 0, 1 or 2 registers.
Conditional branches on zero register or high bit of register.
Branch but send old PC to reg.
Branch to reg.