> because PCIe 4 lanes will top out at forwarding around 7gbps of traffic [...] limited by PCIe alone
Are you sure about that? With 5GT/s (or 500MB/s) per lane, and with 4 lanes, that should be plenty, no? Intel adapters like the x520-DA2 are specced at 2x 10G, and use PCIe 2.0 x8.
FWIW, I was also able to iperf3 around 3.7Gbps on a X520-DA2 connected to an RPi4's single-lane PCIe 2.0.
But PCIe is full duplex! With PCIe 2.0 x4 there's 4 lanes in each direction [1], so when 'forwarding' over a single 10G link you can expect to send and receive simultaneously at the speed I mentioned earlier.
Yah, I guess dividing by 2 isn't fair. But transmitting does impact receiving and vice-versa: when you're reading DMA descriptors, you need to wait/hold for posted completions, etc. It's not fully uncontended between send and receive, but more uncontended than a naive division by 2 would imply.
Are you sure about that? With 5GT/s (or 500MB/s) per lane, and with 4 lanes, that should be plenty, no? Intel adapters like the x520-DA2 are specced at 2x 10G, and use PCIe 2.0 x8.
FWIW, I was also able to iperf3 around 3.7Gbps on a X520-DA2 connected to an RPi4's single-lane PCIe 2.0.