NVIDIA has announced it has found a solution to the water crisis plaguing data centers: hot water. The company says that the next generation of AI factories will run hotter than a hot tub, yet consume less energy and water thanks to a breakthrough in liquid cooling technology.
In a new blog post published during London Climate Week, NVIDIA detailed how its upcoming Rubin-generation AI systems will become the industry’s first fully liquid-cooled AI infrastructure, eliminating fans entirely and replacing traditional air-cooling systems with closed-loop liquid cooling.
Liquid cooling is nothing new; it’s been around since the 1970s. And in some regards, gamers are ahead of the enterprise when it comes to liquid cooling because more and more gamers are using liquid cooling systems instead of fans in their high-powered rigs and the industry is catering to them.
Like virtually all technologies, liquid cooling has been slow to gain traction in the enterprise. There’s the typical cautiousness of the enterprise and adopting anything new, but beyond that liquid cooling has been slow on the uptick because deploying it is not a trivial matter. The data center will have to be rearchitected, sometimes from the ground up, to accommodate liquid cooling versus air cooling. So while companies sing the praises of liquid cooling for its efficiency and reducing heat, the fact is rollout has been slow.
NVIDIA says the Rubin generation of NVIDIA AI infrastructure is the world’s first to achieve 100% liquid cooling — every chip, every networking component, cooled entirely by liquid in a closed loop with no fans anywhere in the system. This will allow for previously unheard of levels of computing density while improving sustainability.
The key innovation is NVIDIA’s ability to operate cooling systems with liquid temperatures reaching 45 degrees Celsius (113 degrees Fahrenheit) — roughly the temperature of a hot tub. The conventional wisdom for some time regarding liquid cooling is to get things as cold as possible through active refrigeration. NVIDIA is simply saying that things don’t have to be so cold.
By allowing the coolant to remain much warmer than previously thought acceptable, data centers can dramatically reduce or eliminate the need for energy-intensive chillers and cooling towers. And in the case of cooling towers, they achieve heat dissipation through evaporation, which causes data centers to consume vast amounts of water, something that has become as troubling to residents near data center facilities as the power requirements.
The move to liquid cooling was driven by necessity. Air cooling works only with low density heat. Anything above 50 kilowatts can be effectively cooled by heat sinks and fans. Above 50 kilowatts and the only viable alternative is liquid cooling. AI racks have been easily surpassing 50 kilowatts and more than 100 kilowatts in some cases. Liquid cooling was not an option; it was mandatory.
The company is also codifying these practices through its NVIDIA DSX AI Factory reference design, which outlines best practices for designing, building and operating large-scale AI facilities.
Allowing for natural heat dissipation rather than active cooling it with refrigeration is nothing new. Intel first realized they could get away with letting Its servers run a little bit hotter than usual almost two decades ago. But in this case, NVIDIA is doing it system-wide. Up to now, liquid cooling has been reserved exclusively for the CPU and GPU. Now it’s being extended to cover networking chips, memory, and system-level components as well.




