From Supercomputer Waste to Community Heat: LRZ Continues Energy Efficiency Leadership Through New Initiatives
Currently, the German government is preparing the much-discussed Energy Efficiency Act (Energieeffizienzgesetz) that will also heavily impact the operation of data centers in Germany. And admittedly, energy consumption in supercomputing centers is really high. However, supercomputing centers have been traditionally at the avantgarde of co-developing energy-efficient solutions with their technology partners and exploring further routes to reduce consumption. The Leibniz Supercomputing Centre (LRZ) has started to extend the hot-water-cooling, that has been the standard for its leadership-class system SuperMUC for more than 10 years, to further compute systems. An in the near future the institute will even produce its own electricity.
Energy and power consumption are always hot topics among researchers and staff at the Leibniz Supercomputing Centre (LRZ). Together with IBM and Lenovo, LRZ has developed a pioneering hot-water cooling system for its supercomputers. Additionally, the data center has been using electricity from renewable sources for several years. Moving forward, LRZ will also become anelectricity producer itself: photovoltaic modules will be installed on its office buildings in Garching near Munich. “This will enable the LRZ to significantly improve its environmental and CO2 balance,” says Prof. Dr. Dieter Kranzlmüller, Director of the LRZ. “We have been running our computers on 100 percent renewable energy for over a decade. By now, we can even can do without adding antifreeze, making our operations even more environmentally friendly. We use the waste heat to air condition our offices. And in the near future we will produce our own electricity.”
Self-sufficient heating with waste heat
When the sun is shining, the planned solar panels will provide up to 300 kilowatts of energy at peak times, which should add up almost 300,000 kilowatt hours per year: This should cover LRZ’s electricity requirements excluding all computing resources. The data center has been self-sufficient in terms of heating for years: in 2012, the then-new HPC system SuperMUC, was cooled for the first time with hot water. The LRZ developed this cooling system together with IBM and Lenovo and has continuously optimized it. Today, it is used in many data centers around the world. In the racks of the successor system SuperMUC-NG, water with a temperature of up to 50 degrees celsius flows and is further heated by its waste heat. Stored in this way, the heat can be used to heat LRZ’s offices, but also to produce cold water for other, older, compute systems with the help of adsorption machines. “We could even send heat to other buildings on campus,” says Kranzlmüller. Other smaller high-performance computing (HPC) systems, such as the CoolMUC Linux cluster or the powerful Cerebras CS-2 system with HPE Superdome Flex servers will soon also be equipped with hot-water cooling: “Due to the current energy crisis and new government requirements to generate heat from sources other than fossil fuels,” says Kranzlmüller, “interest in our waste heat is growing in the neighbourhood.” At the research campus in Garching, where LRZ is based, waste heat usage will be expanded: LRZ’s waste heat shows promise as an additional heat as well as the existing district heating network. To be able to deliver it, a heat pump must be installed, either – at LRZ or one of the consumers, where the waste heat from the computers is increased to up to 100 degrees celsius. Recently, building management ensured that the innovative warm or, better, hot-water cooling system even works without antifreeze by cleverly modifying pipes and water storage tanks.
Operating data improves control
Together with fellow Gauss Centre for Supercomputing HPC centers and other research institutes, LRZ is working on further solutions to reduce the power consumption of supercomputers and still increase their performance. When SuperMUC-NG is running at full speed, it consumes 3,400 kilowatts of electricity, but most of the time the system runs at a reduced clock frequency of 2.3 gigahertz instead of the possible 2.7 gigahertz. Many applications do not benefit from a higher clock speed anyway, thus power consumption can be reduced by around 30 per cent on an annual average. Better job scheduling also reduces the energy requirements for supercomputing, and LRZ uses Energy Aware Scheduling (EAS) for SuperMUC-NG. Jobs are combined in such a way as to keep memory and processors as busy as possible. This also improves the overall power consumption. More tools are being developed for EAS, with a current focus on data transfer within parallel systems, which helps to further reduce energy consumption.
LRZ is also counting on monitoring and artificial intelligence to make HPC more energy-efficient. Operating data can be used to set up intelligent controls and further automate processes. In the more than 6,480 computing nodes of SuperMUC-NG, around 15 million sensors collect data on performance, temperature, the load on individual components, and how the machine handles software and applications. LRZ specialists have developed the opensource software Data Centre Data Base (DCDB) and an initial system for analyzing this information. Both are publicly accessible, discussed with other supercomputing centers, modified, improved, and developed further. This solution is becoming the basis for intelligent computer control. Operating data could also be used to better adapt software and HPC algorithms to a computer’s needs, and programming also offers further opportunities to reduce energy demand, although the effects are difficult to assess.
Tools for monitoring HPC
Data Centre Data Base (DCDB) is a tool for collecting operational data from supercomputers that is freely available and can be optimised: https://gitlab.lrz.de/dcdb/dcdb
An initial system for “Operational Data Analytics” (ODA) was developed at the LRZ: https://gitlab.lrz.de/dcdb/dcdb/-/tree/master#dcdbanalytic