How Chiplets Assemble Into the Most Advanced SoCs

For this article, I decided to take a quick pause from Verilog, to write an article on a topic that I had been researching the past few months. You have probably come across the term “chiplet”, and may be wondering what this latest trend in SoC design is about. In this article, I will explore some of the background topics and technologies around chiplet based designs, and give you many links to follow to find out more. I hope you find this topic equally interesting as Verilog coding. Here we go!

Why Chiplet Designs

Today’s complex SoCs are approaching (and in some cases already exceeded) the physical limit of how large a single silicon die can be manufactured. This limit is called the reticle limit. According to article in Protocol,

But big die sizes create big problems. One fundamental issue is that it’s currently impossible to print a chip larger than the blueprint used in the photolithography stage of chip manufacturing, called a photomask. Because of technical limits, the beam of light shining through the photomask to reproduce the blueprint onto the silicon wafer cannot print chips larger than about 850 square millimeters.

Chiplets helped save AMD. They might also help save Moore’s law and head off an energy crisis. Protocol, July 20 2022

Therefore without breaking up a design into multiple dies (or multiple chiplet), engineers simply will not be able to design some cutting edge SoCs.

The economics of producing monolithic SoCs on a single large die, at cutting edge technologies, is becoming (or may have already become) prohibitive. A large, single monolithic, die will be more prone to having irrecoverable defects than a smaller die, causing lower yield and higher overall cost. In a chiplet based design, each chiplet is smaller, so the amount of silicon (die) that needs to be thrown away for an irrecoverable die is less, leading to lower cost. Cost is further exacerbated by the increasingly higher cost of the latest lithography node. AMD estimates that using a chiplet based in their Epyc processor led to a >40% reduction in cost (AMD on Why Chiplets—And Why Now – The Next Platform).

When a SoC is broken up into chiplets, the design becomes more modular. First advantage is this can allow different chiplet to be manufactured at the best lithography technology for that purpose. For example, many radio frequency (RF) circuitry do not perform well in cutting edge logic process technologies. These circuitry can potentially be designed on a chiplet that uses a less dense, but more suitable, process technology. Another example is separating large SRAM memories (like large system caches) from compute logic transistors, on to different a different die, allows optimization of the process technology for each of those dies, leading to better overall performance metrics like power and operating frequency. One can even imagine building SoCs using different process technology of different foundries, and stitching up the chiplet into a single SoC.

Having a more modular design also facilitates reuse. A “holy grail” of the industry is to source an entire chiplet from different IP/chiplet providers, and be able to stitch them up together to create a custom SoC, much like how SoC designers source IPs from different vendors today. There are numerous challenges to achieving that vision, first of which is there was not yet a standard to describe how different chiplets can communicate with each other. In 2022, Universal Chiplet Interconnect Express (UCIe) 1.0 started this effort of standardizing die-to-die interconnects in the industry. It has gathered much early interest, but it is going to be a long journey.

One final advantage of chiplet-based design applies in particular to 3D integrated SoCs. With 3D chiplet integration, there is now a 3rd dimension to exploit and “route” signals, leading to potentially smaller distances between logic. 3D integration can obviously also achieve higher area density. With the pace of lithography advances slowing in the latest process technology generations, 3D integration can become an important dimension to continue the pace of integrating more transistors into a given area (Moore’s Law). The potential smaller distances between logic, when integrated 3D and routed using the 3rd dimension, can also translate into multiple advantages, like reducing latency and power (less capacitance).

Challenges of Chiplet Designs

While chiplet based designs have many advantages, there are also many challenges in building these complex designs.

Firstly, for each additional chiplet that is integrated into a single package, there is a risk of causing a defect in the packaging step, leading to a non-functional package and yield loss at the package level. The entire package, containing multiple chiplets, may then need to be thrown away altogether. Therefore while the cost of manufacturing the individual chiplet decreases with a smaller chiplet, the cost of packaging them together increases. For low cost product segments, single monolithic die based designs may continue to be the most economical.

Today’s die-to-die interfaces, which are the workhorse for die-to-die communication in multi-chiplet designs, occupy a larger area on silicon compared to a standard wire on a die, leading to an “area tax”. Die-to-die interfaces today simply are not as dense as regular wires on a single die. For a signal to cross to a different chiplet, it needs to first be routed to a die-to-die interface PHY, driven off-die, over to the die-to-die interface PHY of the other die, and finally routed back to on-die wires. There are limitations in how densely wires can be printed on the material that connects between dies (substrate material, or silicon “bridge”), how small and dense the solder bumps can be made (in technologies where solder is used to make die-to-die connections, like Intel Foveros), and just the area of having additional die-to-die PHY logic.

Designing and optimizing across multiple chiplets is obviously more difficult and complex than building a single monolithic die. Electronic Design Automation (EDA) vendors are actively working on tools to help partition, design, analyze, multi-chiplet designs, that are integrated in 2D and 3D. Some tools have been released for these design flows, such as Cadence 3D-IC and Synopsys 3DIC Compiler.

Finally, as alluded to earlier, there is not yet a mature standard for die-to-die communication. Multi-chiplet design efforts have so far been limited to companies that are vertically integrated from IP to SoC product (and even to manufacturing, like Intel). Universal Chiplet Interconnect Express (UCIe) 1.0 defines a common PHY layer, and a protocol layer to carry Peripheral Component Interconnect Express (PCIe) and Compute Express Link (CXL) protocols, over a die-to-die interface. However, if you need to carry other protocols, the specification essentially left the definition to the implementer. In order to have a chiplet ecosystem that can fully interoperate, more standardization is needed to carry other protocols that are non-PCIe and non-CXL (such as AMBA protocols).

Chiplet (Die-to-Die) Interfaces

There are already many existing, and competing, die-to-die interfaces in the industry. The following table shows some die-to-die interfaces currently in the market.

StandardPromoterDescription
AIB (Advanced Interconnect Bus)Intel, CHIPS AllianceParallel interface. Used for example in Intel Stratix 10 FPGA. Latest spec v2.0 (June 2021)
BoW (Bunch of Wires)Open Compute Project (OCP) subgroup Open Domain Specific Architecture (ODSA)Parallel interface. Championed by the Open Compute Project (OCP)
OpenHBI 1.0/2.0Xilinx, OCP ODSAParallel interface inspired by JEDEC HBM. PHY can also support JEDEC HBM devices
PCIeIntel, PCI SIGSerial interface. Use PCIe as a short range die-to-die interface. Not ideal (very high power). Used in Intel Kaby Lake-G CPU
UCIe (Universal Chiplet Interconnect Express)Intel, industry consortiumParallel interface. Defines how to carry PCIe and CXL protocols over die-to-die interface, and a raw/streaming mode. 1.0 spec released March 2022
XSR (Extra Short Reach)
USR (Ultra Short Reach)
VSR (Very Short Reach)
OIFSerial interface. Championed by optical networking forum
Selection of die-to-die interfaces in the market today

When comparing die-to-die interfaces, designers use several common metrics:

  • Data rate – the data rate of a single data I/O
  • Bump space (pitch) – spacing between adjacent data I/Os of the die-to-die PHY, on the die
  • Power efficiency – power to transmit a bit to the other die. A common metric is pJ/bit
  • Edge density – combined metric of data rate and bump pitch. A common metric is Tbps/mm, which means for 1mm of die edge, how many I/Os (and at what data rate) can be packed
  • Area density – combined metric of data rate and bump pitch. A common metric is Tbps/mm2, which means for 1mm2 die area, how many I/Os (and at what data rate) can be packed

A presentation from OCP Tech Week Nov 2020 Die-to-Die Interface Comparison makes a comparison between several die-to-die interfaces along these metrics, which I will not repeat here.

Chiplet Integration Methods

Chiplet can be integrated using a variety of methods. Both Intel and TSMC have similar competing technologies to address different integration requirements. Some major categories are:

  1. Standard / Multi-Chip Package
  2. 2.5D Silicon interposer
  3. 2.5D Silicon “bridge”
  4. 3D Solder Bonding
  5. 3D Hybrid Bonding

The following image, that Intel presented at its 2017 Technology and Manufacturing Day, shows a good comparison between methods 1 to 3.

Comparing chiplet integration methods. Intel Technology and Manufacturing Day 2017
Comparing chiplet integration methods. Intel Technology and Manufacturing Day 2017

The following image, presented at Intel Architecture Day 2020, shows a comparison between methods 4 and 5.

Comparing solder and hybrid bonding chiplet integration methods. Intel Architecture Day 2020
Comparing solder and hybrid bonding chiplet integration methods. Intel Architecture Day 2020

Each chiplet integration method uses different technologies and manufacturing flows, leading to different properties. However, one key comparison is the bump pitch of the die-to-die interface, which dictates how densely the wires in a die-to-die interface can be packed, and indirectly the bandwidth of the die-to-die interface.

For reference, the bump pitch of some die-to-die technologies are in the order of:

  • Standard package ~100um
  • 2.5D advanced package (e.g. Intel EMIB) ~50um
  • 3D (solder) bonding (e.g. Intel Foveros) ~50um
  • 3D hybrid bonding (e.g. Intel Foveros Direct) <=10um

Standard / Multi-Chip Package

Standard package is the simplest integration method to understand. The multiple dies are simply placed on a single package substrate, and connected together via traces in the substrate. This is also sometimes called multichip modules (MCMs). There is no advanced packaging technology involved.

2.5D Integration Methods

2.5D integration methods still integrate dies in 2-dimension. The name “2.5D” is intended to convey that these advanced packaging methods can achieve a much higher signal density compared to traditional 2D integration methods (standard / multi-chip package).

Silicon interposer is a piece of silicon that sits between the die and the package substrate. Since this layer is made from silicon, it can be manufactured using advanced silicon manufacturing processes, and achieve a dense bump pitch. However, similar to manufacturing any other silicon die, in order to pass a signal from the top side to the bottom side of the silicon die, through silicon vias (TSVs) need to be manufactured into the silicon. Both this extra piece of silicon (the interposer), and TSVs, increase the cost of the solution.

2.5D silicon “bridge” packaging refers to integration methods that embed a smaller piece of “bridge” silicon, within either a (non-silicon) interposer or substrate, to act as die-to-die interconnect. An example of this method is Intel Embedded Multi-die Interconnect Bridge (EMIB) technology.

3D Solder Bonding

3D integration methods truly stack dies in a 3-dimension manner, on top of each other. More traditional 3D integration methods use solder bumps to bond two dies together. Using solder bumps to bond dies has limitations in how densely solder bumps can be fabricated, and the resulting bump pitch and signal density that can be achieved in the 3D die-to-die interconnect. Intel Foveros is an example of 3D solder bonding technology.

3D Hybrid Bonding (Interconnect; HBI)

3D hybrid bonding is also true 3D integration. It does not use solder microbumps to bond two dies. Instead, the interconnect of each die is exposed on the surface of the die, and directly bonded together (without solder). This technique removes the solder bumps, which is one of the limiters of bump pitch and interconnect density. As a result, hybrid bonding can achieve much smaller bump pitches. Intel Foveros Direct is an example of 3D hybrid bonding. It supports bump pitch of <10um.

Examples of Chiplets SoCs

Both Intel and AMD have utilized chiplet based designs in their latest server and client CPUs. On the server CPU front, AMD’s 3rd generation EPYC (Milan) processor is comprised of 8 CPU chiplets and 1 I/O chiplet, integrated in 2D. The AMD Milan-X processor with 3D V-Cache further has a SRAM cache die integrated in 3D on top of each of the compute dies. Intel’s upcoming Sapphire Rapids Xeon server CPU is also a chiplet based design, and comprises of 4 chiplets connected in 2.5D using EMIB.

Intel Lakefield hybrid CPU is the first Intel SoC to use Intel Foveros 3D integration technology. I already mentioned AMD Milan-X processor with 3D V-Cache as another example of 3D integration (manufactured by TSMC).

Intel’s Ponte Vecchio High Performance Compute GPU is one of the extreme examples of chiplet based design, comprising of a whopping 40+ tiles per SoC. See YouTube: Xe-HPC and Ponte Vecchio – Architecture Day 2021 | Intel Technology.

You can find some nice figures and animations of these chiplet SoC examples by following the links.

Conclusion

Chiplet based SoC designs is already mainstream in many high end markets like server CPUs, HPC GPUs, high end client CPUs. This trend will only likely continue, as more and more SoCs become complex enough to realize the benefits of disaggregating from a single monolithic die to multiple smaller chiplets. Chiplet based designs offer many advantages, like potentially lower cost, overcoming the maximum die size (reticle size), being more dense, more modular, more reusable. But they also come with many challenges, like higher design complexity, lowering of yield from packaging defects, and many proprietary protocols and competing standards. The Universal Chiplet Interconnect Express (UCIe) is one standard that is gathering momentum in the industry, but it will be a long journey to arrive at the vision and holy grail of being able to build SoCs by mixing and matching chiplet from different vendors and foundries.

This has been a fun topic to research (and a long topic to write about). Everything about chiplet is so new and developing, that it is likely to become outdated as soon as it is written about. I’m excited to follow these ongoing developments, to see where this emerging field goes.

Feel free to leave me some comments below if you have feedback about this article!

References

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.