Light Weight Threads in Keykos

A Little History

The first programmed concurrency that I observed was when IBM added “channels” to the 709 architecture. A stream of many words could move between tape and core without the attention of the CPU to each of those words as it moved. Previously the CPU’s attention was required for its ability to compute addresses and access core. The channel included these abilities. Commands were available to test the channel for completion. A typical program would start the I/O, go on about its other work for however it could until it needed the data and then loiter until the channel was done.

Shortly thereafter the interrupt was added to the architecture (7090 I think). This allowed the program to arrange to perform some function promptly upon completion of the I/O. A tasks that alternated between I/O and short CPU bursts could run concurrently and efficiently with a CPU consuming task and this latter would not need frequently executed tests scattered about the code to poll the I/O.

Already two paradigms existed:

Fine grained concurrency: The program overlapped an I/O operation with computing closely associated by application logic with that I/O.
Coarse grained concurrency: Two processors, the CPU and the channel, proceed independently with their tasks without fine grained coordination. While the channel needs to steal the CPU periodically for brief times, this a detail that does not impact the larger picture.

These hardware concurrency features were introduced without language support. Fortran programs requiring these features resorted to home-grown assembler code to accomplish whatever CPU switching was done.

Today

The same gamut of applications persist today. Unix applications requiring fine grained concurrency usually settle for knowing that some other application can use the CPU profitably while it does I/O. Unix also has a perspective allowing it to read ahead or write behind for sequential file access. Applications with coarse grained requirements can even run in different address spaces.

Many applications have used the Unix “select” construct to fashion a style of concurrency mixed with exclusion control. The application logic can count on there being just one CPU executing code in the address space and that CPU will redirect its attention only at the select statement. If the application code arranges for the application data to be in good shape as it reaches the select statement, this achieves an easy style of managing concurrency. (Better words needed!)

This style cannot be adapted to multiple CPUs. Such application code design does not manifest the parallelism typically inherent in the original problem. Still it is a convenient style and used by many useful legacy programs.

I think that “select” is at the heart of the Unix light weight process.