Cloud native EDA tools & pre-optimized hardware platforms
Priyank Shukla, Sr. Staff Product Marketing Manager, Synopsys
A network of disaggregated compute resources connected through PCI Express (PCIe) and Ethernet create a hyperscale data center infrastructure, offering a hyper-convergent compute platform. The network, as shown in figure 1, primarily relies on two interfaces: PCIe/Compute Express Link (CXL) for chip-to-chip or in-a-rack-unit connectivity and Ethernet for off-the-rack connectivity. A PCIe Network Interface Card (NIC) converts PCIe to Ethernet and allows implementation of Ethernet fabric through layers of network switches. This article explains the requirements for a 224Gbps electrical interface, including channels, signal modulation and SerDes technology, in next-generation high-performance computing (HPC) designs.
Figure 1: HPC as a network of compute resources linked through PCIe and Ethernet
As shown in figure 2, the throughput of a four-lane PCIe matches the highest per lane Ethernet data rate: PCIe 2.0 with 10Gbps, PCIe 3.0 with 25Gbps, PCIe 4.0 with 50Gbps, and PCIe 5.0 with 100Gbps. During the last decade, this parity has allowed a x16 PCIe NIC to interface with a 40G/100G/200G and 400G Ethernet port without additional gear-boxing, saving total system power and minimizing latency.
Figure 2: x16 PCIe throughput parity with x4 Ethernet Port Bandwidth
PCIe 6.0 operating at 64Gbps, is paving the way for a higher than 100Gbps per-lane electrical Ethernet interface which enables an efficient implementation of a x16 PCIe 6.0 to an 800G Ethernet port.
In addition, data center optics are evolving to support higher network bandwidth. Table 1 summarizes the projected availability of 100G/200G lambda optics. 800G-DR8/FR8 modules with 100G lambda optical transceiver have started deploying in HPC data centers. The next doubling of the network bandwidth with four-lane 800G optical module and 102.4T switch will depend on 200G lambda optics and the corresponding 200Gbps electrical interface.
Table 1: Adoption of 100G/200G lambda optics
Finally, with modern high radix switches, the switching bandwidth of an HPC rack is constrained by the highest density Ethernet port. The current 400G/800G Ethernet ports use four/eight lanes of 100Gbps electrical/optical transceivers. Table 2 highlights the Ethernet port bandwidth timeline, showing the need for 200Gbps SerDes for a 102.4T switch rollout with 800G Ethernet port.
Table 2: Industry¡¯s Ethernet switch/SerDes/port timelines
Following these trends, IEEE 802.3, OTN ITU-T G and OIF-CEI have kicked off their standardization efforts targeting a higher than 112Gbps per lane electrical signaling.
OIF-CEI aims to standardize an electrical interface that can be used with multiple protocols including Ethernet, fiber channel, and Interlaken. Ethernet data rate is the net digital throughput at the Media Access Control (MAC) layer, and Ethernet recommends end-to-end forward error correction (FEC) and encoding scheme, which make the resulting raw SerDes electrical or line rate higher than per lane Ethernet throughput. For instance, a four-lane 400G Ethernet, which uses Reed Solomon (514-544) and 256/257 encoding, needs effective line rate of 106.25Gbps. Then the question becomes, when doubling the data rate, why doesn¡¯t the line rate just double to 212.5Gbps? Why are we talking about 224Gbps?
Some of the early discussions on industry forums suggest 800G Ethernet/1.6T Ethernet might go with concatenated or end-to-end FEC. In addition, there are various FEC considerations such as staircase/zipper code for 800G coherent links, all of which will add different overhead. Also, fiber channel and InfiniBand will have a different overhead. As a result, OIF has started drafting the next-generation CEI-224G electrical interface.
Higher order modulation increases bits per symbol or bits per Unit Interval (UI) and offers a tradeoff between channel bandwidth and signal amplitude. It is common for standards to explore higher orders of modulation scheme with increased data rate. In 2012 PAM-4/6/8 were considered for 100Gbps line rates and eventually PAM-4 emerged as the modulation of choice for 56Gbps and 112Gbps SerDes. With holistic electrical -optical ¨C electrical (E-O-E) system view, it becomes clear that if the modulation scheme and the data rate for optical transceiver and electrical SerDes are not the same, any E-O-E conversion would need a gearbox and add additional power and latency overhead. So, choosing the electrical modulation scheme is closely linked to the same modulation scheme in 200G lambda. Initial work in 200G lambda optical transceiver shows PAM-4 or PAM-6 modulation scheme seems to be plausible.
PAM-4 modulation offers backward compatibility with previous generations, provides better signal to noise ratio (SNR) compared to higher modulation schemes, and allows a lower FEC overhead architecture that results in latency. However, the implementation requires better Analog Front End (AFE) due to analog bandwidth limitation and advanced equalization through innovative DSP schemes.
PAM-6 modulation can encode 2.5 bits per symbol and its implementation with DSQ-32 adds a SNR loss of about 3.2dB compared to PAM-4. It offers a way to implement SerDes with lower AFE bandwidth than PAM-4 SerDes, but a higher SNR. The PAM-6 modulation scheme adds higher FEC overhead which results in more area, power, and reduced coding efficiency.
High speed electrical SerDes application continues to evolve. OIF started the first 112G-VSR project in August of 2016 and added multi-chip module (MCM), extra short reach (XSR), medium reach (MR), and long reach (LR) in 2017/18. In 2021, OIF started two new projects for 112G linear and 112G-Extra Short Reach (XSR)+. OIF is now working on the CEI-224G standard to identify and define next-generation electrical interfaces in a typical system for die-to-die, die-to-OE (optical engine), chip-to-module, and chip-to-chip within a printed circuit board assembly (PCBA), between two PCBAs over a backplane/midplane or a copper cable, or even between two chassis.
In similar lines, IEEE standards board recently approved IEEE P802.3df Project Authorization Request (PAR) to define Media Access Control Parameters, Physical Layers and Management Parameters for 200Gbps, 400Gbps, 800Gbps, and 1.6Tbps electrical interfaces. The task force will develop 200Gbps per lane electrical signaling standard for 1/2/4/8-lane variants of Attachment Unit Interface (AUIs) and electrical physical medium dependents (PMDs).
Figure 3: Early use cases of 224Gbps electrical SerDes
Figure 3 shows early use cases of 224Gbps electrical SerDes, which the industry believes will be adopted by 102.4T switch and 800G/1.6T coherent optical modules that require 224Gbps XSR/XSR+/VSR/LR SerDes.
The 224Gbps data rate is reducing the unit interval (UI) design to the order of logic delay. Even with PAM-6 modulation, the UI for 224Gbps data rate will be ~11ps. Looking into the evolution of HPC data center channels and considering improvement/changes in packaging and PCB materials, it is apparent that 224Gbps transceiver will need a flexible, high bandwidth AFE that stretches the boundaries of available transistor bandwidth.
Figure 4: End-to-end 224Gbps electrical link highlighting receiver DSP
As shown in Figure 4, the 224Gbps receiver must have an adaptive and differentiated DSP to work across conventional VSR/MR/LR channels. Two SoCs implementing 224G SerDes can be interconnected through diverse channels, with channel loss ranging from the low teems to over 40dB. To ensure an error free link, it is crucial to maintain compliant pre-FEC raw BER. Figure 5 illustrates performance of 3nm 224G silicon across channels ranging from 13dB to 42dB, demonstrating margin of 100,000x to 1,000,000,000x better than the specification.
Figure 5: 224G silicon across channels ranging from 13dB to 42dB
224G SerDes is needed to continue the pace of data processing in HPC data centers. Ethernet has become the defacto standard for server-to-server communication in modern HPC data centers. Because of this reason, organizations are defining and developing next-generation electrical interfaces including 224Gbps. Synopsys provides a complete 200G/400G and 800G Ethernet controller and PHY IP solution that includes the Physical Coding Sublayer (PCS), Physical Medium Dependents (PMD), Physical Media Attachment (PMA) and auto negotiation functionalities. While 800G Ethernet and 1.6T Ethernet definitions are underway, Synopsys¡¯ Ethernet IP solutions are enabling early adoption of 800G/1.6T per port bandwidth with 224G SerDes.