Home / News / Technology / Blackwell GPU Overheating Forces Nvidia To Revamp Server Design, Some Customers Face Delays of Over a Year
Technology
3 min read

Blackwell GPU Overheating Forces Nvidia To Revamp Server Design, Some Customers Face Delays of Over a Year

Last Updated
James Morales
Last Updated

Key Takeaways

  • Nvidia is facing production delays for its Blackwell GB200 servers.
  • The powerful 72-GPU racks need to be redesigned due to overheating.
  • Due to massive demand, there is currently a one-year waiting list for Blackwell chips.

Multiple redesigns have delayed Nvidia’s timeline for shipping its Blackwell GPU solutions, meaning that some customers must now wait over a year to receive their orders.

Just two months after Nvidia’s chip supplier TSMC had to fix a GPU design flaw, The Information has reported  that Blackwell servers intended for data centers are overheating, prompting another redesign and further production delays.

Blackwell Servers Overheating

Unlike Nvidia’s other verticals, such as the market for consumer graphics cards, Blackwell processors are intended to be used in large data centers and dedicated AI supercomputers.

As such, the company isn’t predominantly selling its most powerful AI chips individually but in preconfigured server racks consisting of 8–72 GPUs with built-in memory and parallel CPU cores in the largest servers.

According to The Information’s report, the overheating issue affects the largest GB200 servers, a liquid-cooled data center platform consisting of 36 Grace CPUs and 72 Blackwell GPUs per rack.

With their immense processing capacity, GB200s reportedly consume up to 120kW of electricity each. (For comparison, a typical desktop computer typically uses around 200–300 watts.)

For servers that cost around $3 million a piece, overheating is a serious issue. This could limit GPU performance and damage internal components.

While the first GB200 servers were initially slated for delivery this quarter, redesigns mean they won’t ship until at least 2025.

Big Tech clients will be among the first to receive the new product. However, massive demand for AI chips means others will have to wait over a year.

Nvidia Blackwell Chips Sell Out

When recently questioned  by Morgan Stanley analysts, Nvidia’s management said the supply of Blackwell GPUs has already sold out for the next 12 months.

With a near-monopoly on the most powerful GPUs used for AI training, Nvidia has struggled to keep up with demand for its AI accelerators.

Previous generations of chips based on the Hopper architecture, including the H100 and H200, initially faced similar wait times of up to 11 months. However, as Nvidia scaled its production capacity, the waiting time was reduced to around three months.

Who Will Receive Nvidia AI Chips?

Presold Blackwell units have been snapped up by Nvidia’s largest customers, including Amazon, CoreWeave, Google, Meta, Microsoft and Oracle. Meanwhile, smaller companies have been relegated to the back of the queue.

Microsoft is expected to receive one of the largest allocations. In an October post  on X, the company revealed that it had built the first cloud server equipped with Blackwell GB200 chips ahead of larger shipments expected in December.

During Nvidia’s AI summit in Japan on Tuesday, Nov. 12, CEO Jensen Huang announced  that SoftBank is building Japan’s most powerful AI supercomputer using the Blackwell platform.

As part of its AI push, the Japanese investment bank will be the first to receive Nvidia’s DGX B200  , a smaller server than the GB200 with eight GPUs.

Was this Article helpful? Yes No