# PCI-Express slot speed and utilization?



## Bobi B. (Mar 2, 2020)

Recently, after researching network performance issues, it was assumed the root cause is PCI-Express slot network interface adapter were installed into. Issue were that Intel 10G adapter were not able to TX more than roughly 5G of data. A secondary adapter were since installed and the issue is somewhat resolved.

Can pciconf(8) be used to measure PCI-Express slot speed?

```
# pciconf -lcv ix0
    cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR NS
                 link x8(x8) speed 5.0(5.0) ASPM disabled(L0s)
# pciconf -lcv ix1 # or ix2
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 link x2(x8) speed 5.0(5.0) ASPM disabled(L0s)
```

Is the conclusion, that ix0 is in x8 PCI-Express v5.0 slot, that runs with speed of 31.52 GB/s (8 times 3.94 GB/s), whereas ix1 and ix2 (dual-port adapter) is in x2 PCI-Express v5.0 slot, that runs with speed of 7.88 GB/s (2 times 3.94 GB/s), correct?
What does it mean `endpoint max data 256(512)` or `endpoint max data 128(512)`?
How about `FLR` and `NS`?
Can network performance issues be attributed to the narrower PCI-Express slot?
Is there a way (a utility) to measure the volume of the data going through a specific PCI-Express slot?
Thank you for your time!


----------



## tingo (Mar 2, 2020)

I case you haven't already, read about "lane" and "lanes" in the Wikipedia PCI Express article: https://en.wikipedia.org/wiki/PCI_Express
Does it get clearer now?


----------



## Bobi B. (Mar 2, 2020)

One more detail: the motherboard is SuperMicro X10SRi-F that provides 6 PCI-Express slots: 1 PCI-E 2.0 ×2, 1 PCI-E 2.0 ×4, 2 PCI-E 3.0 ×8, 1 PCI-E 3.0 ×4 and 1 PCI-E 3.0 ×16. So `5.0` in pciconf(8) output is GT/s, not PCI-Express version?


----------



## Phishfry (Mar 2, 2020)

Bobi B. said:


> output is GT/s, not PCI-Express version?


Exactly. You can match that up with the chart on Wikipedia tingo referenced.
It is transfer rate as found on the 'PCI Express link performance' chart.


----------



## Phishfry (Mar 3, 2020)

Here is an NVMe:
cap 10[70] = PCI-Express 2 endpoint max data 128(128) FLR NS
                 link x4(x4) speed 8.0(8.0)

As you can see it is running at PCIe 3 (speed 8.0) with a x4 link
I realize PCI-Express 2 can be confusing but I think it means this:
PCI-Express (2 endpoint) (max data 128)



Bobi B. said:


> How about `FLR` and `NS`?


FLR=Function Level Reset
NS= NoSnoop

As found in: /usr/src/usr.sbin/pciconf/cap.c
FLR info found here: http://alexforencich.com/wiki/en/pcie/hot-reset-linux
NS info found here" https://software.intel.com/en-us/fo...optimization-platform-monitoring/topic/401498
Also see: `man pciconf`


----------



## ralphbsz (Mar 3, 2020)

PCI express speeds: V1.0 = 2.5 GT/s, V2.0 = 5 GT/s, V3.0 = 8 GT/s, per lane. These transfer speeds are in serial bits per second.
To encode bytes, PCIe V1 and V2 use 8B10B encoding, so you need 10 serial transfers for one byte, so the speed is 250 MByte/s and 500 MByte/s. PCIe V3 uses a more efficient encoding (I vaguely remember 64B-something, but maybe it was 128B130B, darn Alzheimer), so you can assume 8 bits per byte, and its speed is fundamentally 1 GByte/s per lane.

Next, as you wrote above: PCIe slots can have anywhere from 1 to 16 (or more?) lanes. Often, slots have fewer lanes connected than the socket allows for. So the output "Link x8(x8)" above means that this is an 8-lane socket that has 8 lanes and can take an 8-lane card, while "x2(x8)" means the physical socket can take an x8 card, but only 2 lanes are connected, so it will run at 2 lane speed. To get the speed, you can just multiply the lanes out, PCIe is very efficient of using all lanes. So the two sockets you describe there both PCIe V2 sockets, the first one with 8 lanes that can do 4 GByte/s, and the next two with 2 lanes which can do 1 GByte/s.

Now, can real-world hardware actually use the bandwidth efficiently? My old rule of thumb (in the 90s and early 2000s) was to assume that PCI bandwidth can only be utilized 80%, at best 90%. But in the 2010s, I saw some colleagues do marvelous things with driver tuning and correct configuration and workload balancing, and you can sometimes get to utilizing over 99% of the PCIe bandwidth, even with multiple cards. A lot depends on getting just the perfect firmware on the motherboard and all adapters to get over 90% utilization.

In any case, your 10gig ethernet card is at the painful edge. One of the interesting things is that PCIe runs on multiples of 2.5 GHz, while Ethernet (and I think Infiniband, but not SAS, I may have those switched around) runs on 3.125 GHz. So a 10gig ethernet card is physically a 12.5 GT/s card (10gigE uses 4 lanes of 3.125 GT/s each), and even after 8B10B encoding, that's still a full GByte/s. So putting that card into one of your 2-lane slots is on the edge of throttling the bandwidth, while in the x8 slot, it should be able to run at full speed with a huge safety margin.

So why are you getting only "5G of data" (I presume you mean 1/2 GByte/s)? Good question. Could be TCP/IP stack, could be incompatibility between your card and the router at the other end, could be bad weather, or too much garlic eaten at lunch. Getting Ethernet speed problems debugged is a difficult art form, and now one that I specialize in.


----------

