Deterministic Replay

Here is an idea we had way back in 1966 on the IBM ACS project. There was an architected privileged ‘instruction counter’ (not pointer) that counted down and trapped upon reaching 0. The kernel was to have implemented an execution mode that would record exactly where interrupts occurred by use of this counter. This mode was so cheap that the faster alternative might not have been worth implementing. It supported another mode whereby a collection of interleaved threads (multiprogramming) sharing RW RAM could be deterministically replayed including where time slicing had happened. By ‘multiprogramming’ we exclude 2 processors simultaneously sharing RW RAM. This could be a powerful tool to find bugs that occurred only in production. It might also support a test mode to try large classes of execution orders for multiprogrammed apps. Together with hooks to modally trap user mode synchronizing operations, these ‘execution orders’ could be fashioned to torture synchronization logic of applications. I don’t think this has been achieved on any deployed system.