Cloud native EDA tools & pre-optimized hardware platforms
In-depth technical articles, white papers, videos, webinars, product announcements and more.
The Universal Chiplet Interconnect Express (UCIe) is creating many possibilities for the semiconductor industry, offering high-bandwidth, low-power, and low-latency die-to-die connectivity in multi-die designs. It enables innovative applications such as custom HBM (cHBM) which require high bandwidth connectivity between the I/O die and HBM DRAM stack dies. This article dives deeper into the different interfaces that UCIe supports to enable network-on-chip (NoC) interconnects.
UCIe defines a comprehensive set of protocols and layers that standardizes the communication between dies (also called chiplets). The standard ensures data can be transmitted at high speeds with minimal latency and power consumption. UCIe includes three layers, as shown in Figure 1:
Figure 1: The UCIe specification layer
Streaming FLIT is when the data sent over the die-to-die interface is packed into FLITs, which are defined by the PCIe and CXL protocols. There are six FLIT formats defined by the UCI standard:
Formats 2 to 6 allow allocation of bytes for CRC retry and header, which the die-to-die adapter utilizes to enable an almost error free link.
In streaming raw mode, the die-to-die adapter does not convert the application data into FLITs. This mode logically connects the PHY RDI interface to the application layer and is the lowest latency path of the die-to-die interconnect.
Synopsys UCIe Controller IP supports different interfaces to the SoC application layer as part of the protocol layer, like CXS , AXI, and CHI C2C. These interfaces are implemented over the streaming FLIT mode of the die-to-die adapter, which means they use one of the defined FLIT formats in the UCIe standard.
Depending on the application, a system can adopt any of the given die-to-die interface types.
Designers must understand if the multi-die design is captive or not. A captive multi-die design is where dies from the same vendor interoperate with each other over a die-to-die IP. In this case, the same vendor is responsible for the data connectivity between the dies. Such a use case is prevalent in the industry. Many companies are designing systems where they are adding functionality or extending what they have in another die of their own.
Some examples of captive applications include a large server die that can be split in half with the intention of behaving like a single processing unit. Such applications are a functional split with a transparent data tunnel from one die to the next, requiring very high bandwidth in several terabits per second over the die-to-die interface.
Another example of a captive system is an I/O chiplet connected to a processing unit chiplet, or a main compute die connected to an AI accelerator chiplet. In such cases, the protocol used could be streaming FLIT or streaming raw, depending on if the CRC or retry enablement in the die-to-die adapter is required. Streaming raw and FLIT interfaces allow vendor proprietary NoCs to be connected over the die-to-die interface, providing a convenient path for system connectivity. It does not require a conversion of data between one die to the other with low latency. Streaming FLIT mode packs data into one of the above 6 FLIT formats. The die-to-die adapter adds CRC and header bytes. This allows a retry mechanism where data is stored in a buffer before traveling to the die-to-die link. If the die-to-die communication has detected any errors, the data stored in the buffer is resent over the link to provide error-free communication. For these reasons, the systems can leverage the die-to-die communication without modifying anything on the proprietary NoC.
In non-captive systems dies from two different vendors interoperate with one another. This open ecosystem approach using off-the-shelf chiplets from various sources is UCIe standard¡¯s ultimate goal. Each die in a non-captive system implements an isolated function designed for optimizing a particular task. The dies often require a low to medium bandwidth.
Since interoperability is a must between two dies in a non-captive application, there is benefit in using industry standard protocols like PCIe and CXL. Such standard protocols have software and ecosystem support to facilitate ease of use from one generation to the next. Protocols like CXL can also allow cache coherency between two dies if required. An example of such systems can be a compute die from one vendor interoperating with an accelerator die from a different vendor.
There are several other applications that require die-to-die connectivity.
The first application shown in Figure 2 is a server or compute die that has homogeneous dies on both sides of the die-to-die interconnect. These chiplets require a NoC-to-NoC interface with low latency. It could be CXS if coherency is required, or AXI if coherency is not needed. A CXS interface receives data in the CXS signal format, which can be either CCIX 2.0 or CHI from the SoC application, and converts it into FLITs. For example, Synopsys UCIe Controller with CXS interface uses 68B FLIT format 2 for CCIX 2.0 data and 256B latency-optimized FLIT format 6 for CHI data. Same applies for the AXI interface which can take AXI4/AXI3 interface signals and convert them into FLITS. These interfaces directly connect to the SoC NoC to run traffic between two dies. The interface can be user-defined or proprietary, in which case the designers can use the UCIe die-to-die adapter streaming raw or streaming FLIT interfaces.
Figure 2: Example of a server chip with homogeneous dies on both sides
The second application, as shown in Figure 3, connects a compute die to the accelerator chiplet. The interface protocol generally requires low latency and coherency, and in some cases targets an open chiplet marketplace. In such as an application, the designer can rely on protocols like CXL or PCIe for interoperability, or they can also leverage UCIe streaming interface where both sides of the dies are from the same vendor.
Figure 3: Server and Accelerator chiplets on both sides leveraging the CXL protocol
Figure 4 shows a die splitting use case where the IO chiplet with Ethernet or PCIe is connected to a compute chiplet. These applications are mainly captive and use streaming raw or streaming FLIT interfaces. They can also use interfaces like AXI if the NoC on the server die also uses AXI.
Figure 4: IO chiplet with compute die interoperating over streaming interface
Most of today¡¯s multi-die designs implement captive dies. HPC and AI are leading applications for such multi-die designs.
As shown in Figure 5, AXI is one of the leading SoC NoC interfaces in most multi-die designs today. CXS interface, which is used extensively with Arm NoCs can support cache coherency. The Synopsys UCIe Controller supports CXS interface to help transfer the CHI C2C data over the interconnect. Synopsys¡¯ controller is optimized to interoperate with Arm NoCs and Arteris IP NoCs.The rest of the market is mainly streaming raw or FLIT interfaces, depending on the application, providing the lowest latency interface from one die to the other. PCIe and CXL protocols are also used where standardization is necessary.
Figure 5: Usage breakdown of NoC interfaces
AXI provides a single interface between manager and subordinate. Each AXI channel, as shown in Figure 6, transfers information in only one direction. The architecture does not require any fixed relationship between the channels, so they can be considered independent.
Figure 6: AXI interface channel overview
The interface to the UCIe die-to-die adaptor is just a tunneling interface that can transfer data from one AXI interface (manager or subordinate) in one die to another AXI interface (subordinate or manager respectively) on another die. It does not manipulate the data in any form. UCIe streaming FLIT, over which the AXI interface is implemented, uses retry mechanism defined by the UCIe standard. When the retry mechanism is enabled, UCIe can offer a point-to-point data lossless communication channel. Implementation can use any of the defined FLIT formats, as chosen by the designer.
For example, Synopsys AXI implementation uses FLIT format 2 or 6. Streaming FLIT format 2 can be used to transport the AXI information in case a lower latency is required, but the bandwidth overhead introduced by UCIe will increase compared to streaming FLIT format 6. In case higher bandwidth is needed, streaming FLIT format 6 can be used to transport the AXI information (with higher latency). This packing of AXI data into FLITs is a proprietary implementation which needs to be present on both sides of the die-to-die interconnect to retrieve the AXI data on the other die in the same fashion as it was packed initially. This creates a limitation when using AXI interface for die-to-die connectivity, where no different vendors implementing AXI over a die-to-die interconnect like UCIe can interoperate with one another. This limitation is prevalent with all vendors in the industry that have AXI over UCIe implementation.
Figure 7 shows an example of the Synopsys AXI implementation with continuous individual read and write from different addresses (no bursts) mapped to FLIT format 6. There are different read and write requests over the read address, write address, and write data channels going from an AXI manager to subordinate that are packed into FLITs. The lower part of Figure 7 shows the responses of the read and write requests from the subordinate to manager.
Figure 7: An example of the Synopsys AXI implementation with continuous individual read and write from different addresses mapped to FLIT format 6
Efficiency of UCIe FLIT packing is 94.11% for 68B Streaming FLIT format 2 and 97.65% for 256B Streaming FLIT format 6 for payload over the header and CRC data Bytes. For each channel in AXI, there are several signaling data in the write data channel. There is a write data channel, write valid, write last with write valid and write ready signals, and more. Overall, in AXI transactions, due to extra data in FLITs from different AXI channels, the efficiency of the actual data payload is lower.
Synopsys offers a complete solution for UCIe IP, including PHY, controller and verification IP. As a leader in multi-die designs, Synopsys fosters collaboration to move innovation forward. The Synopsys UCIe PHY IP supports 16G, 32G, 40G, and 64G data rates on the most advanced process and packaging technologies. Synopsys UCIe Controller supports streaming raw, streaming FLIT, and interfaces like AXI, CXS, CHI C2C, and protocols like PCIe and CXL. Our partnership with industry standard NoC vendors like Arm and ArterisIP ensures system interoperability and high performance, making implementation easier for our customers.
Includes in-depth technical articles, white papers, videos, upcoming webinars, product announcements and more.