91³Ô¹ÏÍø

Understanding Bandwidth: Back to Basics

Richard Solomon

Apr 21, 2016 / 3 min read

One of the questions I¡¯ve been getting a lot recently is along the lines of ¡°How many lanes and of what ¡®generation¡¯ of PCI Express do I need?¡±

This is a fairly straightforward question, and while coming up with a good first-order estimation is also fairly straightforward, it¡¯s not necessarily obvious in the PCI Express specification.  Let¡¯s start with the ¡°raw¡± data rate, which is fairly easy:

 

PCI Express Data Rates

¡°³Ò±ð²Ô1¡±

2.5 Gb/s

¡°³Ò±ð²Ô2¡±

5 Gb/s

¡°³Ò±ð²Ô3¡±

8 Gb/s

¡°³Ò±ð²Ô4¡±

16 Gb/s

 

Folks who are new to PCIe may be scratching their heads right about now and thinking ¡°Richard said before that each generation of PCIe has doubled the bandwidth¡­. so what happened between Gen2 and Gen3??!?!¡±  That leads us to the second piece of the puzzle ¨C the encoding scheme.  The original PCI Express specification used ¡°8b10b¡± encoding ¨C which means every 8 bits of data was expanded to 10 bits when sent on the wire.  I won¡¯t go into the details here of why this was done, but it was a common technique for limiting ¡°runs¡± of 0s and 1s in the data stream.  When the 5Gb/s ¡°³Ò±ð²Ô2¡± data rate was developed, it kept the same encoding scheme.  However, when ¡°³Ò±ð²Ô3¡± was being developed, it was hoped that by limiting the actual signaling rate to something below 10Gb/s simpler receivers could be defined (this ultimately didn¡¯t happen, but that¡¯s a story for another Flashback I suppose).  To do that, and still keep a ¡°doubling¡±, the encoding scheme for ¡°³Ò±ð²Ô3¡± was changed to 128/130 ¨C meaning every 128 bits of data get expanded only to 130 bits (instead of to 160 as 8b10b would have). 

So 5 Gigabits multiplied by 8/10 gives 4 Gigabits/second of effective data transfer, while 8 Gigabits multiplied by 128/130 gives 7.88 Gigabits/second which is close enough to double ?.

¡°Ok Richard, I¡¯ve got it ¨C so I take the data rate, multiply by the encoding factor and I¡¯ve got my real per-lane data rate, right?¡±

 

PCI Express Data Rates

Encoding Factor

¡°³Ò±ð²Ô1¡±

2.5 GT/s

(8/10)

¡°³Ò±ð²Ô2¡±

5 GT/s

(8/10)

¡°³Ò±ð²Ô3¡±

8 GT/s

(128/130)

¡°³Ò±ð²Ô4¡±

16 GT/s

(128/130)

 

Packet Efficiency

That¡¯s the first step, yes, but I¡¯m afraid there¡¯s one more piece of the puzzle ¨C the packet efficiency.  This is just a reflection of the fact that there is overhead to every packet sent on PCI Express.  Firstly, every data packet includes a header which is either 3 or 4 DWORDs (32-bit or 4-byte chunks), so we add 12 or 16 bytes of overhead for that.  Every data packet also includes a 1 DWORD LCRC, so add 4 more bytes for that.  Then there is a sequence number and some start/stop information ¨C for simplicity we¡¯ll pretend that¡¯s always another 4 bytes total.  (While true for ¡°³Ò±ð²Ô1¡± and ¡°³Ò±ð²Ô2¡± the 128/130 encoding scheme makes this not exactly accurate for ¡°³Ò±ð²Ô3¡± and ¡°³Ò±ð²Ô4¡±, but it will do for our purposes at the moment.)  Lastly, there is an optional End-to-End CRC called the ECRC which can be included in packets as well, at a cost of another 4 bytes. 

Since ECRC isn¡¯t commonly used, let¡¯s just look at 3 DWORD and 4 DWORD header packets and add those 20 or 24 bytes of overhead to our PCI Express packet sizes.  So for 128 byte packets, we actually have to send 128+20=148 or 128+24=152 bytes, which means our packet efficiency is 128/148=0.865 or 128/152=0.842.  Doing that math for the rest of the packet sizes and expressing efficiency as a percentage gives:

 

 Header Size

Efficiency (%) for Various Packet Sizes (Bytes)

128

256

512

1024

2048

4096

3-DWORD

86.5%

92.8%

96.2%

98.1%

99.0%

99.5%

4-DWORD

84.2%

91.4%

95.5%

97.7%

98.8%

99.4%

 

So *NOW* you¡¯ve got the calculation down!  Take the ¡°raw¡± data rate, multiply by the encoding factor, then by the packet efficiency to get the effective data rate per lane.  Of course if you¡¯re using a multi-lane implementation, you get to multiply that by the number of lanes. 

Header Packets

I should also mention that generally the use of 3-DWORD vs 4-DWORD headers is tied to whether your system is addressing 32-bit or 64-bit memory.  So a small client system with less than 4GB of main memory might well use 100% 3-DWORD header packets, while a huge server running I/O Virtualization might come close to 100% 4-DWORD header packets.  You could just be pessimistic and assume 100% 4-DWORD headers or you could make your own assumptions. (Averaging the 3-DWORD and 4-DWORD efficiencies isn¡¯t uncommon ¨C which is probably where the ¡°85%¡± number commonly batted around as the ¡°PCIe efficiency¡± comes from: 128-byte packets with an even mix of 3-DWORD and 4-DWORD headers.) 

So for an 8 lane (aka ¡°x8¡±)  ¡°³Ò±ð²Ô3¡± implementation running 256-byte packets and using the more pessimistic 4-DWORD efficiency, we get: 8Gb/s * (128/130) * (0.914) * 8 = 57.6Gb/s or 7.2GB/s.

Clear as mud?

Probably needless to say, but I¡¯ll say it anyway ¨C if this estimation is very close to your actual bandwidth needs and if it¡¯s critical you never fall short, then do a more detailed analysis!  In real-world systems we¡¯ve used logic analyzers on actual hardware and measured the Synopsys controller IP hitting better than 98% of these numbers but there are obviously many factors which can come into play.  Contact your friendly local Synopsys Application Engineer or drop a note to me if you need help digging deeper into your own PCI Express application.

Continue Reading