As a signal travels thru a large heterogeneous computer center it frequently needs the attention of a CPU. If overall latency is to be minimized then this attention is needed quickly. Scheduling the CPU has almost always been the duty of the kernel except in a few very specialized switching nodes with a small amount of code exclusively devoted to the task of switching packets of some small set of formats. HPC (High Performance Computing) harnesses many CPUs which do not all share RAM. Conventional kernels, at least, spend thousands of cycles deciding to allocate a CPU to a newly arrived signal. One can imagine that there are smarter ways to accomplish this allocation task, but frankly, I don’t know any and I have tried to think of some. I am skeptical that Intel has.
I am also very skeptical of attempts to put so complex a protocol as the Internet stack, including TCP-IP into hardware as suggested here. I should warn the reader that I have a low opinion of TCP/IP as a way to move a lot of data. I have no opinion of the InfiniBand protocols for I have not been able to find them.
Sometimes I fear that companies rely on complexity as a business strategy. I have the x86 architecture in mind.
RDMA “Remote Direct Memory Access” according to Mellanox.
The information provided here says: