I never learned the application of this hardware idea but it was to perform a vast number of ‘add to memory’ operations to some large set of accumulators in memory. I think this idea came from CUNY. These operations would be generated by application logic (hardware or software) and generally have to move some physical distance to reach the right accumulator. The idea that was novel to me was to consolidate these traveling operations. If the message *(int*)0x9862983468 += 47 met another such message with the same address, the two would be consolidated before reaching memory. Buffering and sorting the messages makes collisions more likely. Even buffering in slow memory wins because bandwidth to memory, not latency is the key memory parameter. Such consolidations would reduce the load on the scarcest commodity, access to the real accumulator.
Here is another closely related idea. Common DIMM design is not appropriate to hosting such accumulators. There are two reasons:
There are many application dependent parameters to work out.