Ameliorations

Mark cache lines that were speculatively loaded. (TLB entries too) Free them upon speculation failure. Sometimes this would actually speed things up.
Another Amelioration:

Make some (all?) of the instructions whose semantics is about cache lines contingent on a bit that only the kernel can set. Architecturally these instructions are all noops. Without the bit set, they are really noops. Only those rare programs with legitimate need for these ops can really execute them. Also limit these ops to addresses to which the program has write access.