The Rise of the Chiplet

ChuckNorris89 · on April 5, 2023

I'm a bit baffled about chiplets stealing all the headlines in the tech news sites ever since AMD started using them in Ryzen, with people preaching this to be the AMD & TSMC killer innovation when actually we've seen them in consumer chips for over 15 years now, from Intel at least

The Intel Pentium D in 2005 was basically two Pentium CPU dies glued close to each other on the same package to create the first ever consumer dual core CPU.[1]

The Intel Core 2 Quad from 2007 was basically two separate Core 2 Due dies glued close to each other on the same package to create the first ever quad core consumer CPU.

I had a 2011 laptop with an Intel P4600 Celeron chip that had the discrete CPU die and GPU dies separated, both on the same package.[2] I remember this clearly since I first took it apart to clean the fan and apply new thermal paste and I was surprised to see two different dies on the package.

And those are the chiplet based chips I know off the top of my head, there probably are even more earlier designs I don't know about, so is there some hidden chiplet agenda that sprung up recently which I'm missing or what's going on?

[1] https://www.hardwarezone.com.sg/m/feature-intels-pentium-xe-...

[2] https://www.x86-guide.net/Xhoba/en/collection/Intel-Celeron-...

dragontamer · on April 5, 2023

Interposers.

The start of "modern" chiplets was AMD's R9 Nano GPU, which I believe was the first to use HBM / stacked memory. Each stack has 1024 microbumps, and the GPU has 4 stacks, meaning 4096 bumps/pins connect the GPU with its RAM. EDIT: The pitch between these microbumps is 40 microns / 0.040 milimeters.

This technology is "advanced packaging", the ability to provide thousands, or even tens-of-thousands, of bumps / effective pins to serve the signals going across our computers.

-------------

Yeah, chiplets or even socket-to-socket communications, have existed for decades. But today we can take advantage of thousands, tens-of-thousands, or even hundreds-of-thousands of external "pins" that connect these components together.

Here's a rundown on Intel's technology: https://www.anandtech.com/show/16823/intel-accelerated-offen...

> Intel is also stating today that it will be using its second generation Foveros technology on the platform, implementing a bump pitch of 36 micron, effectively doubling the connection density over the first generation.

36 microns, or 36 um (micrometers) pitch density on Intel's advanced packaging technology. That's a lot of pins per mm^2!

Now I don't know how these guys are lining up 0.036 milimeter bumps and reliably making connections. I kind of imagine a very tiny soldering iron, but I'm probably wrong.

-------

In practice, AMD has led the way. Not only with "chiplets", but also off-die L3 cache (aka: x3d cache), adding 64MB of external SRAM to their chips through advanced packaging. So these thousands-of-microbumps are fast, reliable, and low-power enough to provide full-speed caches (something not quite possible with those earlier Pentiums you were talking about).

paulmd · on April 6, 2023

> Interposers.

Interposers are just one way of packaging MCM designs. AMD's RX 7900 series doesn't use a silicon interposer, it's still an (early) MCM design (although of course it's only pulling out memory controllers and cache, not multi-GCD). And technically since it's already a fiberglass package, there's no reason you can't stick things right onto a PCB instead, it's just a matter of convenience for partners placing/routing one thing instead of 7 things.

https://www.techpowerup.com/301071/amd-explains-the-economic...

Infinity Fanout integration is a different type of MCM integration with some of its own upsides (cost) and downsides (higher power, less bumpout). There's also bridges, and actual 3D stacks of multiple compute dies (or 2.5D with memory/etc) that don't use interposers at all.

https://semiengineering.com/using-silicon-bridges-in-package...

All of these are technically MCM - MCM is any sort of multi-chip package. MCM also really does include things like Core2Quad, Pentium-D, Crystalwell, Xenos, and various IBM modules in the 70s and 80s. The idea has been kicking around for a long time and "multi-chip module" refers to all of them collectively, not just interposer-based CPUs.

https://en.wikipedia.org/wiki/Multi-chip_module

And while yes, stacking and direct-bonding are great and lower power requirements a lot... there are still dies with stacking and no interposer! Ryzen x3d chips are a great example, v-cache is a MCM but there's no interposer there. And technically Ryzen itself (without v-cache) is also MCM too and still sees large benefits there too.

cma · on April 5, 2023

Pentium D was a multi chip module, and so are AMD's chiplets, but Pentium D didn't use chiplets:

> ICs that can perform most, if not all of the functions of a component of a computer, such as the CPU. Examples of this include implementations of IBM's POWER5 and Intel's Core 2 Quad. Multiple copies of the same IC are used to build the final product. In the case of POWER5, multiple POWER5 processors and their associated off-die L3 cache are used to build the final package. With the Core 2 Quad, effectively two Core 2 Duo dies were packaged together.

> ICs that perform only some of the functions, or "Intellectual Property Blocks" ("IP Blocks"), of a component in a computer. These are known as chiplets.[3][4] An example of this are the processing ICs and I/O IC of AMD's Zen 2-based processors.

https://www.wikipedia.org/wiki/Multi-chip_module

It seems chiplets are a different subset but it is very similar.

Also, AMD used different process nodes for different parts, where that wouldn't make sense in just a dual cpu package, and certain circuitry is now starting to scale different as process nodes shrink, so they may want a different older process node for cache than for logic, etc. which could emphasize the different functions aspect and explain why it is in the news more. 3d v-cache uses an older node for that reason, though needs more thorough connectivity than chiplet tech so uses interposers I think.

You might also want to buy third-party IP that is only available created with design rules from a different foundry. Chiplets let you integrate it, where an approach like SoCs wouldn't.

Shot noise is a bigger problem in EUV, so you can potentially get better yields by splitting functional components into different chiplets rather than the alternative of fusing off parts of a chip and selling it as a lower tier. Those are the reasons I see as to why they are getting lots of press and attention even though the packaging technology may be old.

Initial reasons for mixing nodes may have been more business related than technical: AMD had to buy a certain amount of output from global foundries after spinning it off.

wtallis · on April 6, 2023

I think another important distinction between today's chiplet CPUs and the Pentium D/Core 2 Quad is that those early parts were still using a traditional shared front side bus connecting CPU cores to the memory controller/northbridge residing on the motherboard. So those MCMs were functionally equivalent to a dual-socket system even from a performance perspective, but harder to cool.

AMD's chiplet-based CPUs gain some real benefits from using faster or lower-power short-range links that would not work between separate CPU sockets, and more advanced packaging using interposers or bridges further reduce the power and performance costs of communication between chiplets. These benefits mattered even for AMD's early chiplet-based CPUs that were homogeneous rather than using a mix of specialized dies.

paulmd · on April 6, 2023

This is about as meaningful a distinction as the marketing copy that defined the GeForce as "the world's first graphics processing unit (CPU)" by creating a definition that matched their exact specs while definitionally excluding its competitors. You have to do 10 million polys/sec on a single monolithic chip or it's not a real GPU guys!!!! Voodoo doesn't count because it's not monolithic!

> ICs that perform only some of the functions, or "Intellectual Property Blocks" ("IP Blocks"), of a component in a computer.

Well, a Pentium D or Core2 Quad didn't have a northbridge onboard, so each chiplet only performed "some of the functions" of a CPU.

Which functions are the important ones that count? Well obviously the ones that AMD did, and not the ones that Intel did, of course. I mean you can't really have a CPU without memory controller so... kind of an important one. One might describe that northbridge as... an IO die. Just not one that lives on the package, because that's not how it was done at the time (monolithic CPUs had external northbridges too).

And obviously that's changed over time, components of the CPU itself go through the same internalize-and-integrate/externalize-and-disambiguate lifecycle as has been well-remarked previously in other aspects of computer design. On-package northbridges aren't something unique to MCM either.

The parent comment that ascribes it to viral marketing and clever rebranding is correct. Everything that is old is new again - the IO die is just a northbridge-as-a-chiplet and the CCDs are pretty similar to pentium-d or core2quad core chiplets. Just branded.

There is very much a lesson to be learned here as far as technical marketing - how would customers know how awesome your thing is if you don't give it a special name to tell them? It's a Graphics Processing Unit, of course it's better than the competition's Boring Old Junk, it's got way more quadroflops and kilopixels! It's not a L3 cache, dad, it's AMD GamerCache, or Radeon Infinity Cache, it's totally different! It's not memory paging/swapping, it's HBCC! It's not PCIe Resizeable BAR, it's Smart Access Memory!

If you don't give it a brand name then people won't know how awesome it is and how lame your competition is for not having your exact implementation of the idea. Or even if they do, hey, yours is the one with the brandname. You can't be an ultrabook, that's our trademark, you're just some underpowered thin-n-light laptop.

https://www.vortez.net/news_story/amd_gamecache_canny_market...

https://www.amd.com/system/files/documents/infinity-cache-te...

https://itigic.com/what-is-amd-hbcc-features-and-how-it-work...

https://www.gpumag.com/smart-access-memory/

It's not that AMD didn't make any improvements - technology marches onwards and they built a good system. It's just not really a difference in kind in the sense you can draw some particular brightline and say "well this is MCM and this isn't"... these ideas have been kicking around for a long time and it's not AMD who invented them, even if they improved them.

Attempts to do so fall into the same trap as "it's only a GPU if it's a GeForce 256 descendant", because you end up with a definition specifically drawn to include the things you like and exclude the things you don't, rather than technically coherent distinctions. It's still the same general idea even if Intel's implementations weren't commercially successful (although I think Core2Quad was pretty successful overall).

--

And while I'm picking on NVIDIA with the "GPU" marketing thing... it's also not like NVIDIA didn't improve the state of the art too! Having 2D and 3D in the same chip was way more convenient overall, the "GPU" was quite revolutionary. But that's also pretty much the baseline expectation, a generationally newer product should be significantly better and will likely be conceived differently to match the nodes and the tech of the time. The reason AMD went super heavy on caches on all their 7nm products (CPU and GPU) was to take advantage of TSMC's SRAM density... and they were well-placed with a good architecture to do that too! But a lot of these things are just "products of their own time" in some ways, not having L3 is common on pre-TSMC N7 GPUs not because nobody had thought of L3 cache before, but rather because SRAM isn't a very efficient use of die area on older nodes and it wasn't a good engineering tradeoff. And then it was.

Like other kinds of alt-history hypotheticals, people tend to underweight the "overall forces of the times" that would have tended to push things in the same direction even if some specific decision had been made differently or whatever. Someone else would have thought of "wow let's use this high-density SRAM that N7 gives us and throw a big cache on our product".

natpalmer1776 · on April 5, 2023

Marketing & viral awareness go brrrrrr

Edit since this was pretty low effort and likely violates the community guidelines I'll add this:

I would guess the reason for seeing more content related to chiplets and their implications is due to a combination of seeing a big player adopt them for their main product line(s) and the resulting PR / marketing buzz that occurs as a result of that having 'trickle down' effects on the general industry discourse as a whole.

tdba · on April 5, 2023

Also the USG is making a big push to reshore chip manufacturing, especially in more future-facing areas such as chiplets.

natpalmer1776 · on April 5, 2023

Took me longer than I care to admit to realize that USG stood for United States Government.

Regarding the statement, I also would point out that there is an almost global push towards promoting domestic chip manufacturing and reducing the reliance on globalization of critical infrastructure, not just in the United States.

In my geopolitical armchair expert opinion, I would guess this is in part caused by the conflict in Ukraine as well as rising tensions between the various 'global powers' further compounding the general loss of confidence that followed COVID-19.

tdba · on April 5, 2023

This push began before Covid and well before the war in Ukraine went hot - but those two factors certainly increased the urgency. At root the push in the US began due to the rise of China as a military competitor in the early 2010s, and the consequent realization that TSMC might be blockaded, captured or destroyed.

jpleger · on April 5, 2023

Chiplets are super neat, I see a near future where we can get more standardized packaging and easier integration to drive the cost of developing custom ICs way down.

I really hope that over the next 5-10 years we can get some more advanced manufacturing like microvias, blind/buried vias and via in pad for PCBs at a lower cost to cut down the cost of iteration on hardware design.

Things like KiCad, FreePCB and Horizon EDA have really been crushing it from an EDA point of view from where it was even 3-5 years ago.

The only gripe I have with these packages is the library management, but that isn't really 100% their fault :(

stacktrust · on April 5, 2023

  Open Chiplet Ecosystem 
  Open-Source EDA Toolchain (24h dev cycle) 
  RISC-V: open-source / licensed / classified
  Upcoming FPGAs made-in-USA by TSMC Arizona
  Domain-specific accelerators

This interoperability vision aims to improve US chip supply chain resilience, integrity, reshoring and reusability of IP blocks in the wake of Moore's Law.

> For chiplet adoption, the industry needs to worry not just about the die-to-die interfaces and packaging technology but the whole chiplet economy. For example, how to describe a chiplet before building it in order to achieve efficient modularity .. Some of the other areas to get addressed include: How to address known-good-die (KGD) in business contracts. How to accomplish architecture exploration? How to handle business logistics?

Until now, proprietary packaging has limited the "chiplet economy". The current iteration of standards initiatives started around 2018.

https://www.opencompute.org/blog/the-ocp-open-domain-specifi...

> Decades of progress with general-purpose CPUs have slowed, while performance requirements of workloads have catapulted, driving significant demand in domain-specific accelerators ... The ODSA subproject’s mission is to define an open interface and architecture that enables the mixing and matching of silicon chiplets from different vendors via an open marketplace onto a single SoC.

Linux Foundation, https://www.chipsalliance.org/

> CHIPS Alliance develops high-quality, open source hardware designs relevant to silicon devices and FPGAs ... Companies and individuals can work together to develop open source CPUs, various peripherals, and complex IP blocks.

In parallel, there is funded university research to create a reliable open-source EDA toolchain, https://woset-workshop.github.io/WOSET2022.html

> Chisel and Verilator provide an open-source stack for digital design. For ASIC synthesis, we have open-source tools like OpenROAD, Yosys, and Magic. OpenROAD is a project to deliver an end-to-end silicon compiler in open source. The aim is to “democratize hardware design” by providing an automated layout generation flow from a design in RTL to GDS files used to produce silicon. Google and Efabless offer free production of chips in a multi-project wafer if the project is available in open source.

David Patterson & John Hennessy's 2018 Turing Award lecture explained why "democratizing hardware design" is needed to reduce cost/time for domain-specific computing, https://news.ycombinator.com/item?id=18118957

Videos from 2021 DARPA ERI (Electronics Resurgence Initiative) conference: https://youtube.com/playlist?list=PL6wMum5UsYvaKtr1GOr-rqhD_... & https://eri-summit.darpa.mil/2021-Agenda