[ad_1]
It’s AWS re:Invent this week, Amazon’s annual celebration of cloud computing in Las Vegas, and as usual, the company has a lot to announce, and it can’t fit everything into its five(!) ideas. Ahead of the show’s official opening, AWS on Monday detailed a number of updates to its overall data center strategy that are worth paying attention to.
More importantly, AWS will soon start using liquid cooling for AI servers and other hardware, regardless of whether those rely on homegrown Trainium chips and Nvidia accelerators. AWS specifically notes that Trainium2 chips (which are still in preview) and “rack-scale AI supercomputing solutions such as the NVIDIA GB200 NVL72” will be cooled this way.
It is worth noting that AWS emphasizes that these updated cooling systems can combine air cooling and liquid cooling. After all, there are still plenty of other servers in data centers that handle networking and storage, for example, that don’t require liquid cooling. “The flexible, multi-modal cooling design allows AWS to deliver maximum performance and efficiency at the lowest cost, whether running traditional workloads or AI models,” AWS explains.
The company also announced that it is moving to simpler electrical and mechanical designs for its servers and server racks.
“The latest AWS data center design improvements include simplified electrical distribution and mechanical systems, enabling 99.9999% infrastructure availability. The simplified systems also reduce the potential number of racks that can be affected by electrical issues by 89%,” the company notes in its announcement AWS does this in part by reducing the number of times electricity is transferred on its way from the electrical grid to the server.
AWS doesn’t provide much detail beyond that, but this likely means using DC power to power the servers and/or the HVAC system and avoiding many of the AC-DC-AC conversion steps (with their virtual losses) otherwise necessary .
“AWS continues to relentlessly innovate its infrastructure to build the most performing, resilient, secure, and sustainable cloud for customers around the world,” Prasad Kalyanaraman, vice president of Infrastructure Services at AWS, said in Monday’s announcement. “These data center capabilities represent an important step forward with increased energy efficiency and flexible support for emerging workloads. But even more exciting is that they are designed to be modular, so we can modernize our existing liquid cooling infrastructure and energy efficiency to power generative AI applications and reduce our carbon footprint.”
In all, AWS says, the new multi-modal cooling system and improved power delivery system will allow the organization to “support a 6x increase in rack power density over the next two years, and another 3x increase in the future.”
In this context, AWS also notes that it is now using artificial intelligence to predict the most effective way to place racks in a data center to reduce the amount of unused or underutilized energy. AWS will also roll out its own control system across its electrical and mechanical devices in the data center, which will come with built-in telemetry services for real-time diagnostics and troubleshooting.
“Data centers must evolve to meet the transformative demands of artificial intelligence,” said Ian Buck, vice president of High-Speed and High-Performance Computing at NVIDIA. “By enabling advanced liquid cooling solutions, AI infrastructure can be cooled efficiently while reducing power use. Our work with AWS on the design of its liquid cooling rack will allow customers to run demanding AI workloads with exceptional performance and efficiency.”
[ad_2]