New Architecture

Today’s machines are scarcely changed from von Neumann’s architecture where a CPU accessed RAM by specifying numbers called ‘addresses’ to locate data in RAM. Williams tubes and core memories have come and gone over 60 years but the relation of the CPU to memory is about the same. Cache has extended this plan for about 30 years.

I grope here for ideas here largely devoid of this plan. The result does not yet even achieve the status of a cartoon architecture however. This architecture is still too much like an Edward Gorey story—too many loose ends! Perhaps others steal some ideas. There are now more questions than answers and the answers are very vague.

There may be enough hardware notions here to design algorithms or map familiar algorithms. Such designs will feed back into hardware ideas. Data processing was once done where data lived as streams on magnetic tapes. Knuth’s ‘Searching and Sorting’ captures much from that very un-obvious lore.

Imagine writing your favorite program in machine language with few or no load and store operations. Try alternatively to imagine your program expressed in a language with few or no references (pointers) or even lvalues. In C there is little so fundamental as these. Haskell is a language for this but the art of Haskell platforms is based firmly on the von Neumann architecture. The language Actors is another source of ideas but like Haskell, implementations have been for von Neumann machines. I am trying to learn Pict now. These languages suggest semantics but not the design of hardware. Sisal hints at computer architecture.

Instead of frequent references to RAM I presume that a stream construct would come to replace the data flow between program modules. A nexus would persist for a time and hold onto the source of a few streams and the sink of some others. When we think of hardware there are myriad questions:

Is the nexus steadily enough employed to devote hardware to it, or must such hardware ‘multi-task’ between various logically distinct nexi?
How is a nexus instructed? A simple instruction unbranched stream is fraught with the problems programmers had in the 50's who tried to put programs in decks of cards (CPC) or punched paper tape to be obeyed as read. It sucked. Perhaps we need small local RAM to instruct the nexus.
How are streams routed? I look here to ATM more than Internet for ideas.
How are nodes connected? Here is a wild idea.

The above questions bear strongly on efficiency issues. There are other problems of possibly less impact on efficiency, yet absolutely necessary:

How does the software create a nexus?
How is layout managed so as to keep streams short?
How are resources protected?

These are the data-flow ideas and insights, presented here as a software discipline. I recall them being originally presented as a source of hardware design ideas. I am not aware that they have ever had more than very local influence on real hardware. You can find them in only very fragmentary form in moderns machines.

Of course many or perhaps most streams will have at least one end in RAM where indexing units will administer some common RAM data patterns.

Streams must be as reliable as memory references—no dropping data at the convenience of the hardware supporting the stream. I presume that the source of the stream may not produce values at any guaranteed rate. The consumer may not be ready to receive as fast as the stream can deliver. Perhaps congested streams can buy temporary RAM space. Alternatively they must throttle the producer. At some granularity these issues must fall back to software; how is the state presented to such software?

What is the model of the calculation that a debugger can sense and explore for the person debugging such software?

I look to the actors language for concepts to name streams and nexi. This leads to natural capability security. This does not directly address issues of hardware resources.

Neither the lifetimes of nexi nor streams may be assumed to span that of the other. We probably require conceptual streams that are not permanently attached to particular stream hardware, just as modern operating systems provide the concept of process not permanently attached to particular CPUs or pages of RAM. This concept need not be provided by hardware, but neither can the hardware precluded it, as early hardware precluded flexible allocation (by the OS) of threads to CPUs.

ATM (Asynchronous Transfer Mode)

ATM drops cells upon congestion. That won’t do here any more than endowing a load instruction with statistical semantics. It is unclear how to fix this. Backpressure (flow control) is necessary. Link level backpressure seems unavoidable. Tymnet did this so it is possible! My bias towards ATM stems mainly from its perceived simplicity which should yield low latency, small hardware and simple definition.

Stream Hardware

I assume that a stream will generally need to flow thru several pieces of hardware devoted to moving and switching streams. This is the general networking discipline. I will refer to hardware that switches as ‘nodes’ much like those in macroscopic networks. I suspect that nodes and nexi have jobs sufficiently different that different hardware designs are required. I assume that routing is done by software. Stream concatenation seems necessary at the application level. Concatenation leads to awkward inefficient streams. Perhaps hardware counters can find awkward streams which can then be improved by software, much as garbage collection improves storage efficiency.

RAM

It would be foolish to entirely banish RAM. I don’t know how. To represent a variable permutation of 256 elements, as in the RC4 algorithm, RAM is clearly the answer. RAM is not only conventional, but sometimes truly convenient. I don’t know how often cache logic is strategic. Some of the microcoded mainframes from IBM held their microcode in an associative cache memory since the entire microcode had grown enormously. This was presumably a strategic decision and not merely something to preserve legacy microcode, of which there was little. The plan of this page is merely to remove RAM from the center of the design.

Protection

I presume that protection is required for broad applications. This means that access to (influence over) nexi and streams is limited to those nexi that receive explicit authority to access them. Perhaps such authority is transmitted thru dedicated streams, or alternatively demarkated portions of the content of normal streams. Joule might be described as passing, access to streams, in streams. This is a difficult issue for hardware design.

Power

This is partly related to the above. It was recently forcefully brought to my attention that the cost of power is enormous when you monetize the inconvenience of finite battery lifetime. It is easy to find cases where the cost of supplying power to a transistor is many time the cost of buying it. This warrants much more vigorous power management than has been seen in consumer electronics, mainly the lap-top and cell-phone. I want a design where the number of transistors powered at once seldom exceeds 2%. I studied implementing RC4 recently and decided that 10⁴ custom transistors might do the job as fast as a 10⁹ transistor general CPU.

Bates describes the same dilemma.