Why is There a Shortage of Nvidia AI Chips?

TSMC’s (Taiwan Semiconductor) 2.5D advanced packaging CoWoS (Chip on wafer and wafer on substrate) technology is currently the primary technology used for AI chips. The production capacity of CoWoS packaging technology is a major bottleneck in AI chip output and will stay as a problem for AI chip supply in 2024.

Nvidia H100 (AI chips) are built into HGX AI supercomputer systems that weigh 70 pounds and have 35,000 components.

GPUs use utilize higher specifications of HBM (stacked computer memory), which needs the integration of core dies using 2.5D advanced packaging technology. The initial stage of chip stacking in CoWoS packaging, known as Chip on Wafer (CoW), primarily undergoes manufacturing at the fab using a 65-nanometer process. Following this, through-silicon via (TSV) is carried out, and the finalized products are stacked and packaged onto the substrate, known as Wafer on Substrate (WoS).

Nvidia is using about 60% of TSMC’s Chip on wafer and wafer on substrate production for its AI GPU chips. TSMC’s CoWoS (Chip on wafer and wafer on substrate) production capacity was about 120,000 units in 2023 and will double to 240,000 units in 2024. TSMC’s monthly CoWoS packaging capacity is expected to increase to 26,000 to 28,000 wafers per month in 2024. Nvidia has started exploring options with its secondary supplier, placing orders with Amkor Technology and United Microelectronics (UMC).

The bottleneck within the CoWoS capacity is primarily due to the supply-demand gap in the interposer. TSMC’s 2.5D packaging technology, Chip-on-Wafer-on-Substrate (CoWoS), integrates multiple active silicon dies on a passive silicon interposer. The interposer acts as a communication layer for the active die (logic and memory) on top.

The TSV (through-silicon via) process is complex, and expanding capacity requires more high-precision equipment. However, the long lead time for high-precision equipment, coupled with the need for regular cleaning and inspection of existing equipment, has resulted in supply shortages.

The high pad count and short trace length requirements of HBM (stacked memory) needs 2.5D technologies like CoWoS to enable a lot of dense, short connections that cannot be done on a PCB or even a package substrate. All HBM systems are currently packaged on CoWoS, and all advanced AI accelerators use HBM.

Apart from TSMC’s dominance in the CoWoS advanced packaging market, other Taiwanese companies such as UMC (United Microelectronics), ASE Technology Holding, and Powertek Technology are also gradually entering the CoWoS advanced packaging market.

Among them, UMC expressed during an investor conference in late July 2023 that it is accelerating the deployment of silicon interposer technology and capacity to meet customer needs in the 2.5D advanced packaging sector. TSMC and UMC are both based in Hsinchu, Taiwan, but TSMC generates about 7-10X the revenue of UMC (United Microelectronics).

10% of TSMCs total capital expenditure for 2024 will be allocated towards expanding capacity in advanced packaging, testing, photomasks, and other areas.

In June 2023, TSMC announced the opening of its Advanced Backend Fab 6 in Zhunan. This fab has enough cleanroom space for potentially 1 million wafers per year of 3D Fabric capacity. This includes not only CoWoS but also SOIC and InFO technologies. This fab is reportedly larger than the rest of TSMC’s other packaging fabs combined.

TSMC is spending $2.87 billion to build a new advanced chip packaging facility in Tongluo Science Park. 1500 people will work there and it will start production in 2027.

2 thoughts on “Why is There a Shortage of Nvidia AI Chips?”

  1. What would China do if the U.S. began manufacturing its own chips instead of relying on TMSC? Note: my knowledge of this subject is woefully lacking.

Comments are closed.