When compiled code calls a routine it allocates a stack frame on a stack, a pointer to which serves as the continuation of the calculation. We explore here to see if that suffices to arrange the ordering of stream processing. (It does not.)

A push style C routine str2 to change to upper case might read:
#include <ctype.h>
void str3(char);
void str2(char in){str3(toupper(in));}
str3 is the downstream consumer. A good compiler should be able to optimize this tail call. With tail call optimization the compiled code would equivalent to our code.

The stutter process should look like:
void str3(char in){str4(in); str4(in);}
This routine will leave a stack frame allocated between the calls to str4. If the stream becomes blocked here, the stuttered character remains stranded on the stack. Classical multi threading will not reacquire this stack frame when it is needed. Recall that there are no unified streams thru these processing nodes to which threads may be allocated. Instead there is a fabric of streams connecting the processing nodes.

The above C code is good for only one stream. A C++ versions, which can be instantiated per stream, might look like:
class up extends stream{
stream down;
void take(char in){down.take(toupper(in));}
up(stream dn){down = dn;}
}
This code serves multiple streams but the stutter case still gets it wrong:
class stutter extends stream{
stream down;
void take(char in){
  down.take(toupper(in));
  down.take(toupper(in));}
up(stream dn){down = dn;}
}

It seems clear to me that one must be able to directly name the continuation as we do in the pushy protocol. Scheme's call-with-current-continuation construct addresses this problem directly. The Pushy protocol does it without requiring garbage collection.

See this about a scheme that efficiently discards the built in continuation and provides its own.