Extreme Scaling on SuperMUC-NG
SuperMUC-NG am Leibniz-Rechenzentrum (Foto: F. Löchner)
Since the beginning of 2020, the SuperMUC-NG has been operating at above 80 percent of its capacity. With its 311,040 computer nodes, it processes 26.9 PFLOP/s and offers a storage capacity of more than 700 terabytes. To help scientists push the SuperMUC-NG and Europe's fastest computers to their limits, the Leibniz Supercomputing Centre (LRZ) organizes an Extreme Scaling Workshop once or twice a year. Experts show tricks how to address compute nodes and eliminate bugs in applications. Interested parties detail their projects to apply for the workshop which lasts several days. Gerald Mathias, habilitated physicist and member of the application support team of the LRZ, knows how supercomputers gain speed and calculate even faster.
Why do supercomputer users need an Extreme Scaling Workshop?
Dr. Gerald Mathias: The users of the SuperMUC-NG are primarily scientists who develop applications for their research questions. They focus on simulations and data, not necessarily computer technology. However, their programs should use as many–preferably even all–nodes of the SuperMUC-NG as possible. The Extreme Scaling Workshop will therefore focus on how applications can be optimized, how the machine can be fully utilized and how computations can be accelerated. The workshop helps the participants to make the best use of the allocated computing time, and helps us to better utilize the SuperMUC-NG and get to know the applications that will be running on our systems in a few years.
What are the main topics?
Dr. Mathias: On the one hand, existing programs are tested. Some errors or problems only become visible at extreme sizes, for example when applications require too much memory or communication between compute nodes takes too long. This costs computing time, so we want to detect and eliminate such errors in the workshop. The hunt for records plays a role here, as does the deepening of knowledge about latest technology and science at a high level. For example, when optimized programs simulate even larger and better models of the origin of the universe, then researchers are happy – and so are we. It's all about the future viability of simulations. What is still possible today on supercomputers will be part of everyday life on the next generation of computers. To debug and optimize, you have to start and evaluate many jobs, you would be busy with them for months in normal operation, that's why the Extreme Scaling Workshop helps. Due to the exclusive access to the machine, the most important optimizations can be done in three days - and the participants can access the help of experts at Intel, Lenovo and the LRZ.
The workshop is designed for the SuperMUC-NG - can I work better on other supercomputers afterwards?
Dr. Mathias: The architecture of today's supercomputers is very similar; they all consist of many compute nodes connected by a fast network. Even if these nodes differ in detail - optimization, debugging, communication between the nodes, the use of memory – all this is very similar on all supercomputers. In this sense, the Extreme Scaling Workshop generally helps with High Performance Computing.
The run on these workshops is high - participants apply with a large-scale project: What criteria do you use to select them?
Dr. Mathias: We can consider a maximum of eight projects, often with teams behind them. Participants need experience in working with supercomputers. In addition, the computing power and the amount of data to be processed, i.e. the research project should be challenging, groundbreaking and innovative. And of course, the SuperMUC-NG should be working at full capacity with the projects.