Coroutine

Management of Coroutines

The technology described below is in the same category as the compiler. Like the compiler it depends on the ABI and certainly the CPU. Like most compilers it uses the assembler directly. It uses the assembler because the necessary code cannot be expressed in C. This particular version is debugged on Mac OS and clang. The first language, other than assembler, that I know that could do first class coroutines is Scheme. Very recently the GO language expresses coroutines.

I design here in English. There are two bodies of code known here only as left and right. There are call sites in the respective bodies to PutGetL and PutGetR. In general one of these bodies is running and the other is waiting at its PutGetX call site. As the running body reaches a PutGetX call site its argument is a value which becomes the value of the PutGetX expression in the other body; that body then proceeds while the first awaits a value. A value of type vR is transferred as left calls PutGetL and thus delivers that value as the value of PutGetR where right is waiting for such a value; and conversely. x.h is a universal (for now) header file.

Since I am not current with all of the assembler directives for the x86 I write a dummy C program with the correct signature and references the right global variables. The file name for the source of such a dummy program ends in “D.c”. This is so that compiling the program with the -S option will not overwrite my only edited assembler file. I compile the dummy program with the -S option which produces the assembler input for that program. Its name ends in “D.s”. I edit the assembler file to produce an assembler program with name ending in “.s” that corresponds to no C program.

The conceit is to compile a routine like this with a shell command:

clang PutGetD.c -O3 -Wmost -S

to produce a file like this. Note that the machine instructions for PutGetL and PutGetR are identical for the current choices of vL and vR. Perhaps a new instruction such as xchg other, %rsp will suffice to exchange the stack pointer. The push and pop of RBP serves to restore that register to what it was when the recently sleeping program most recently called. It is scary that those two instructions are not needed in the context seen by the compiler but are needed in our modified context. Are there other registers that the caller expects to be preserved? Does this routine not save them because the caller is expected to, or because this routine does not use them. The same question goes for the exotic XMM and MMX registers, etc. Seeds of confusion are found here: Microsoft, turmoil. According to page 10 of this, we must preserve RBX, RBP, and R12 thru R15. Clang produced code that did not save these because it did not use them. Clang compiled the useless command movq %rsp, %rbp along with the push and pop commands for RBP. Curiously the latter commands are necessary in our variation. First we also remove the code for dmy which was compiled merely to learn how to access the cell other. Then we replace both movq commands with

push %r12
push %r13
push %r14
push %r15
xchg _other(%rip), %rsp
pop %r15
pop %r14
pop %r13
pop %r12

That gives us this which we compile thus:

clang PutGet.s -c

To proceed with the logic of initiation we take a nearly useless and trivial ‘right’ program:

R.c is perhaps the simplest useful coroutine. If its correspondent (L.c) knew R.c’s behavior when L.c was written then R.c’s function would be put inline, but not otherwise.

Simple code in main.c is about the simplest coroutine to pair with R.c to verify behavior. How do we introduce the two? Note that the cell other is uninitialized. If we merely run main it will call GetPutL only to use whatever garbage is in other. In general we need two stacks too.

I do not see a symmetric way to launch the coroutines. My plan is for a program to call routine begin for the animation of other code and then become the ‘left’ code body. The other code must normally initialize its state and then become the right code body. The first execution of PutGetR ends the initiation phase. That execution leads back to the begin function which returns the first vL value as the value of begin.

There is something spooky that we most reason about. It is a little more than a stack state; it is that state plus all the unprivileged register values; we call it the execution state (ES). At some point ES is duplicated and we must be sure that enough and not too much state is duplicated. This observation is in light of starting off the accumulated code without thinking about this.

beginD.c, compiled thus: clang beginD.c -O3 -Wmost -S refers to an imaginary external called “RSP”. It produces the assembler source, which we edit by changing references to the nonexistent RSP to references to the stack pointer. That gets this and here are the differences. We have trampled on some debugger conventions, I think.

Now we compile and run main and cringe:

clang main.c begin.s R.c PutGet.s -Wmost -O3

The funny stuff begins as we call begin. A few register values from main are saved on the old stack. We do a tail call to startR which soon calls PutGetR which resurrects the old state with those saved values. Note that function startR does never returns in this model. We might want to change this as we decide how to tear down the coroutine and give back its stack.

In general there is something to do which is peculiar to the server but in this case there is nothing except to initialize c in the code at serv.c

In this case a new instance of a coroutine is created by a call to makeI.

Intel calls the stack pointer RSP.
See page 1356 in 325462-050US
On page 1558 of 2015 Jan manual: last entry in XCHG table
XCHG other, RSP

This is useful. ABI too which reminded me that upon routine entry, RSP’s value mod 16 must be 8. Thanks.
Good conceptual intro to coroutines; I do not like his solution but his defense is worth reading.

in manual page 1407 describes instruction ret.
“CFA” means “Canonical Frame Address”.
I cannot find information about the rôle of RBP in calling conventions, beyond that it should be preserved upon return. Is any code except the code compiled in the current procedure concerned with the convention that is accesses a frame? Do occasional interrupt routines assume such a convention? Do occasional interrupt routines have any business peeking? Perhaps debuggers? What is the contract about RBP at the point just before the first subroutine instruction has been executed? RBP is among the callee saved registers but there seems to be other function too. The following fails:

clang mn.c beg.s
./a.out

I presume that