The problem that I see is that ‘endpoints’ in the PCIe complex can talk to each other with no limitation or authentication. Message formats include fields saying “this is from X”, but Y too can say “this is from X”. The PCIe BAR registers in an endpoint allow that endpoint to access anything on the PCIe network. They are capability registers. According to PCIe specs the content of those register values are supposed to be controlled exclusively by the root complex, imagine how convenient it would be however if the GPU code could modify those registers. The programmer and circuit designer are likely to work for the same company. It would open up all sorts of possibilities, especially for the bad guys. The schemes below offload X from knowing its own identity or the identity of its correspondents; the network knows and can vouch for the sender’s identity and deliver messages to the right recipient.

In this scheme the kernel, working thru the root complex, defines circuits thru the PCIe network. The integrity of the circuit depends only on the portion of network thru which the circuit passes. The endpoints each see only one end of a full duplex circuit. Most endpoints are too simple to need more than one circuit. Some endpoints are able to terminate several circuits. The circuit protocol does not carry names of endpoints in any form. The circuits are analogous to physical wires but the kernel can rewire those circuits in a few short TLPs (Transaction Layer Packets). These notions are familiar to capability theorists.

We outline here a hardware design that limits and authenticates TLPs where the kernel, thru the root complex, establishes those limits and authentications. I think that this is all compatible with the low level PCIe hardware described in the Wikipedia article and offloads some function of the endpoints. We discuss two plans here. They agree on details about traffic between root complex and endpoints. They differ on routing traffic between endpoints. Both these plans decrease the reliance set from what I understand that for PCIe to be.

As in PCIe, links are directed and thus switches know from which direction kernel authority comes. A device has an upstream port at which packets from the root complex arrive. Some devices have downstream ports via which they serve a subsets of the tree that they oversee. There are in either plan two sorts of packets:

Packets of type A have a header field, called WHO, that shrinks as it consumes steering information provided at the root. An A packet arrives at a switch via the upstream port. If WHO is empty the packet has arrived at its destination. Otherwise the switch consumes n bits from WHO and forwards the packet to the downstream port identified by the n bits. n depends on the switch and is determined at the factory and is known by the root.

There are a few possible commands in an A packet that a switch can obey. Here are a command from the root complex that the switch will obey: “Report the number of downstream ports including information about buffering space.” This causes the switch to respond with a B packet to the upstream port including this information. The format of this packet is ‘switch configuration report’.

B packets also have a WHO field that grows as it gathers information on its way to the root. When an endpoint or switch originates a B packet it initializes the WHO field as the zero length bit string. A B packet arrives at a switch on downstream port k, and k is appended to WHO in an n bit field. When a B packet arrives at the root complex and has traveled thru correctly operating switches, the WHO field identifies the originating endpoint by logic that does not presume that the endpoint conforms to PCIe specs.

Coding the WHO field

Endpoint to Endpoint

Two plans; I hope only one is implemented.
  1. Routing in each switch
  2. Special routing switch
New TCB
On page 108 of PCIe System Architecture (abridged) we learn Either of my schemes change this as follows. There is no configuration of ‘devices’ before communication, but there is exploration by the kernel of the shape of the tree to discover how to address the endpoints that are there. This shape is set in stone as the hardware is laid out, except for hot-plug extensions. Circuits between endpoints must be ‘configured’ by the kernel before any inter endpoint communication.

Page 255 of same book says:

I think that this is a protocol between the kernel and the devices to reserve link bandwidth and buffer space for priority traffic.