{nonref}Exporting Access to a Segment

General Points

By this we mean that we do not know how to construct a program E such that if E is given read access to S, E can construct a new capability that will satisfy R as well as S does. (When we have thought about this problem for a while we will try for read-write access.)

Although we fail to solve the general problem, the incomplete attempts described below are a source of ideas for specialized solutions.

The Read Only Case

We now wish to cause R to operate on a new machine M2. We assume that the operating system on M1 can support access to a communications facility by a program on M1. We assume that the operating system on M1 may not be extended beyond what we have now described.

We now wish to install a program E in M1 that is given read-only access to S and access to a communication link to M2. We can make any assumptions about the operating system in M2 that we know how to implement. How are we to design E and the operating system in M2 so that R may be executed in M2 without modification to R?

We shall impose a few performance restrictions on the problem as we go. If we were willing to cause M2 to simulate each instruction of R and request of E each storage location value as R was interpreted, R would run, but 5 orders of magnitude slower than on M1.

We shall ease some of the restrictions without coming to an entirely satisfactory solution.

In this section we present a tree of partial solutions. At the root of this tree {i.e., in common to all of the schemes} is the idea of a cache of pages at M2 that appear directly in R's address space. The program whose logic we are designing is able to add and remove pages from this cache and to gain control when R references a page that is absent from the cache.

An immediate extension of this scheme allows R to run with n pages with the help of another M1 primitive to make instantaneous copies of n pages at once. This scheme works if the headway made by R between page faults is large compared with the cost of making the copies. Such is not always the case.

Except for performance it would suffice for E to be able to make an instantaneous copy of all pages at M1 that correspond to pages in the cache at M2. This effort is quadratic in the working set size however.

Another way of describing the difficulty is to note that having read access to S is not sufficient to sense the order of changes to S but this order must be preserved in the changing image of S maintained at M2.

Page Versions

These primitives are strange and may be difficult to implement. But they provide no extra information to the holder of the capability that could have otherwise been considered as safely hidden. These functions can be provided to segments by segment keepers. This may not provide sufficient performance in the critical applications mentioned here.

One scheme that uses these primitives is as follows:

E keeps a page version key for each page in the cache at M2. Every time a new page is demanded at M2, E creates a page version key for that page, adds that to its list and then copies the page and transmits it. M2 incorporates the new page in the cache but R is not restarted yet. E then checks all page version keys (including the new one) and signals M2 to delete any that are not current. R may now be restarted. This scheme causes R at M2 to run on stale data but the changes to S that R sees are at least in the right sequence.

Elaborations on this scheme may limit the degree of time lag that R sees.

The ability to demand prior notification, however, is far more than the ability to read. If such an ability were replicated widely then the primitive page construct would find itself burdened with remembering a list of indefinite length of programs to notify. Further there is the dilemma of how to notify. Passive notification merely sets a bit and waits for the notifyee to notice it. This is clearly no better than the previous scheme. The other sense of notification is to wait for acknowledgment from the notifyee. If the notifyee is buggy or malicious the acknowledgment may never come. At this point we have allowed the reader enough power to entirely block the progress of the writer of the segment. This is only marginally less serious than destroying W.

Several schemes have been proposed to ameliorate these problems.

A trusted program CPE (certified prompt keys) will produce on demand a start key that is widely trusted to be prompt. CPE always provides the code for the program behind the start key. The start key will only do a very small class of operations. One such operation might be the rescinding of a capability. Another might be to enqueue a short message on a communication link. These two types of prompt key support the cases that have motivated the design of the system.

A class of difficulty in this approach is in CPE determining that the capabilities handed to it are indeed quickly executed. For example a capability to rescind access may not be manifestly prompt to CPE. CPE might establish a time-out, but this is itself time consuming.

Another general problem is the qualities of any performance guarantee is a thing of many parameters. The caveats to the guarantee include: disk controller being up, schedulers (user replaceable?) doing what they advertise, etc.

The Read Write Case

Imagine program 1 on processor 1: (STORE 10000; FETCH 20000) and program 2 on processor 2: (STORE 20000; FETCH 10000). We say that program 1 sees program 2 if 1's fetch from 20000 gets the value stored by 2's store. Likewise for 2 seeing 1. We might naïvely imagine three possible orders of execution: 1 sees 2 but 2 doesn't see 1, 2 sees 1 but 1 doesn't see 2, or 1 and 2 see each other. On a 370 it may be that neither sees the other. We believe that real 370's limit this effect to time scales of a few microseconds; the hardware caches never hold the only valid value for some real storage for more than a few microseconds.

It is possible to eliminate this effect by not running two processes with write access to the same page at once. We do not intend to judge whether the expense of this restriction is worthwhile, but merely to point out that programs may exist that assume that it will not occur. Such programs will work in multiprogrammed systems with a single processor and on multiprocessor systems if the delay between the store and fetch is larger than a few microseconds.

Some strategies for supporting distributed access to segments might take the same latitude as the 370 but allow much larger “skews” in time.

Such latitude merely says that stores of one program take place in the correct order as seen by any other program.

Programs written in strict accordance with the 370 design would work in this environment. But some programs that would work on any real 370 would fail in this environment.

This scheme requires trapping the 370 instructions that are designed to synchronize processes, CS, CDS, TS and the synchronizing NOP.

The alternative to taking this latitude is to say that conceptually only one program is allowed to have write access to the data at once. See (p3,synseg).

Who Can See Page Faults