This is a Quixotic proposal for an ultra clean and simple virtual machine. As secondary goals it should be at least moderately efficient in space and time. There are two sub goals:
If memory serves me right the programming manual for the IBM 701 was about 20 pages and much of that bulk was devoted to describing the IO features that were complex for purposes of making the IO hardware cheap instead of simple. (I recall that the spec for the model 33 teletype was one page including the electrical characteristics.) The 701 computational instructions were however both simple and simply described. Arithmetic was fixed point and self modifying code was necessary due to lack of either general registers or index registers.
I would propose 2’s complement binary with data sizes being a power of 2. I think this is simpler than the arguably more convenient powers of 10. I seek simplicity here over convenience. I actually think binary wins even for convenience in capturing today’s code. I see nothing wrong with the conventional 8 bit byte addressing. Byte addressing seems simple, efficient and convenient. It will be big-endian if I decide. I see no reason not to stick to 64 bit addresses for most code today can easily adapt to that size and that should be adequate for the purposes of this project.
I would propose general registers whose contents would contribute to effective address calculation. I am queasy about full IEEE floating point for that would shoot my entire complexity budget. IEEE floating specs are rather heavier than the entire 701 manual. I might consider an abbreviated floating spec with function about like the IBM 704 which would serve most purposes. Perhaps eliminating many corner cases while retaining the plain real numbers IEEE formats can denote. I would propose some simple but deterministic floating semantics perhaps following the IBM 360 style of floating semantics that is parasitic on properties of real numbers.
Much of the software we want to preserve produces pixels and so an optional pixel output mechanism is needed. Square pixels with three fixed size color components seem simple, efficient and familiar to modern software. I suppose that input in this format is also suitable. 16 bit sound samples at some fixed, or perhaps specifiable sample frequency is another option. The latter would also serve quite a range of other analog IO. I suspect that the new touch-screen input modes, while important, are too new to consider yet.
I have no idea that such a platform as this would necessarily exclude proprietary formats, programs or data, but I wonder what 100 or 1000 year old data property might mean. I shudder.
An operand stack discipline (RPN) is an attractive plan but I don’t want to fix on that yet. Extant intermediary complier formats or extant VM byte codes may be right. There are LLVM, JVM, .net formats that probably have at least ideas to contribute. I suspect that the specs of each of those are at least an order of magnitude too big however.
I like the idea of redundant specs. The figure of merit is not the sum of the sizes of these redundant specs, but more the minimum of these sizes. The minimum is not really right, however, because some such specs will depend on complex prerequisites such as the lore of denotational semantics, the specs of some extant language, or some other obscure discipline. Obscure disciplines will reveal difficulties not evident in other disciplines. Redundant specs can achieve durability especially if redundant implementations become available against which to test applications intended for archiving.
Perhaps another option is a protection mode suitable for kernels. This might serve to support combinations of OSes and applications with incestuous relationships. I hope that most applications would run on the platform with at most a few standard libraries implemented upon the platform without features like multi-threading or any other handshaking with the kernel. I should think that stack discipline is purely an application issue and that the instruction set would support all the common stack ideas.
Don Knuth’s MIX architecture should be considered.
I purposely avoid a design that wants to be implemented in silicon, for that would soon drag the design in the direction of complexity; simplicity is the prime goal here.