A joint development

idee

The LRZ is the first supercomputing centre in Europe to rely on an innovation partnership to procure its next supercomputer. Its planning and development involves many colleagues, and the BEAST test environment from the future computing programme delivers also worthful informations.

 

New computer technologies are currently being discussed a lot and often at the LRZ. Once a week, three colleagues from each of the teams Future Computing, Artificial Intelligence and Big Data, as well as from Computational X Support (CXS), meet with representatives from HPE or Lenovo to discuss the design of future processor types, network technologies and other components. "We discuss what computing nodes and systems should look like, how they should be built and how they should work," says Amir Raoofy, computer scientist, engineer and member of Future Computing. "It's about understanding different design options, balancing their performance and also the costs involved, or coming up with benchmark ideas to evaluate these options." The regular discussion groups serve to prepare, specify, optimise the developments for the next HPC system for the LRZ.

Different perspectives and experiences needed

The technology for supercomputing is currently undergoing rapid development. In addition to different Central Processing Units (CPU), a variety of accelerators are available today, for example Graphics Processing Units (GPU) as well as Field Programmable Gate Arrays (FPGA). In addition there are various techniques to connect these components to ensure a fast data flow. But as potential components are selected, the development of a supercomputer becomes more complex: "That's why our next HPC system is being co-designed with technology providers. Together we are currently developing prototypes or optimising components," explains Prof. Dr Dieter Kranzlmüller, Head of the Leibniz Supercomputing Centre (LRZ). "Hardware and software are coordinated closely with our specialists for the desired application." The innovation partnership sets the legal and economic framework for this close cooperation. It is a new procurement procedure in the form of a multi-stage competition between potential supplier or integrator companies that can build the supercomputer for the LRZ and develop prototypes for it and embed innovative components from other manufacturers.

Today's supercomputers are expected to not only process large amounts of data and perform classical simulations, but also master artificial intelligence (AI) methods, operate as energy-efficiently as possible and, in the near future, even be able to accommodate quantum technologies. As a result, a wide variety of perspectives and experiences are needed to build innovative systems. For this reason, more colleagues than ever before are involved in the development of the new supercomputer at the LRZ. At least 12 specialists from seven working groups plus all the management levels are constantly working on the individual development steps and the necessary optimisations. In addition, the Bavarian Energy, Architecture and Software Testbed (BEAST) is used to clarify questions.

Engaging in discussions, communicating with and informing others

HPE and Lenovo won the first concept phase of the innovation partnership in 2022. In the meantime, they have set up racks with initial prototypes in separate rooms and are discussing these solutions with their respective LRZ team every week. "The companies are happy to learn from us what the requirements of different user groups are, they want to know how and what a supercomputer will be used for and what kinds of applications will run on it one day," says Dr Josef Weidendorfer, head of the Future Computing programme. To ensure fair competition, all participants are sworn to secrecy. The prototypes are not only installed in locked rooms, but each manufacturer is also assigned a cross-departmental trio that can inform the other colleagues about possible services and functionalities, however, they are not allowed to talk about the design of the technology. That would affect the patent rights and intellectual properities of the both companies. "The actual hardware is not yet produced," Raoofy notes. "But these prototype systems can, on the one hand, help to estimate and understand the technology of the future HPC system, and on the other the development of system software, for example for operational monitoring or energy management, can already be discussed and started in greater depth."

With the arrival of the first working models, the LRZ team has also installed an efficient communication process: The technology companies meet regularly with their three contacts, who, in addition to Future Computing, primarily cover the application-oriented areas of data and AI as well as HPC and thus future application areas of the supercomputer. What these two teams learn from and discuss with the companies in terms of technical solutions is then regularly debated in various internal department and management meetings: "In this way, we bring all the colleagues involved up to speed and we bring different perspectives into the discussion on technical solutions and proposals," Raoofy explains. "In addition, the companies sometimes need feedback on specific requirements, for example on how they can design storage or operating concepts, or on possible operating systems or on the use of management software. We then ask for feedback from the colleagues in charge, and we incorporate it into our next discussions with the companies.

Involving application groups in planning

In addition, BEAST also builds a bridge between the present and the future, between technology that is already available and planned innovations: "BEAST contains different processor and accelerator architectures. This allows us to clarify expectations and set our own benchmarks," Weidendorfer reports. The test environment is constantly being expanded through loans and purchases. At present, systems from AMD and Intel, ARM processors and GPUs from NVIDIA are installed, as well as various accelerators. In this way, different architectures and new types of components come together, offering opportunities to draw comparisons, and also allowing a glimpse of current technology trends and possible development paths, including in the innovation partnership: "Before we started, the BEAST systems offered us an overview of available hardware developments," explains Dr. Gerald Mathias, head of the CXS-Lab, which supports researchers in their work with the LRZ supercomputers. "We were able to test codes, measure performance data, optimise on different architectures or processor types. And with these findings, we are now able to follow in detail what the impact of current innovations is."

The results from experiments in the test field, in addition to observations from research work with various chip concepts, and also performance and comparison data, were incorporated into the specifications for the future supercomputer and form the basis for various benchmark suites, which can now be used to objectively view and evaluate the prototypes. With the help of BEAST, scientists and users of the SuperMUG-NG were also able to participate in the planning of the new supercomputer.

In BEAST, research groups test how their applications work on different computer architectures. The scientists of the SeisSol seismology project, for example, wanted to know whether they could add new parameters to their earthquake code when CPU and GPU work together, in addition they wanted to know how the code can be implemented on new processors and how it works with them. " These kinds of tests only work on BEAST," says application and support specialist Mathias. " These experiments allowed us to bring the experiences and wishes of our users into the innovation partnership." This resulted in potential application scenarios and additional comparative benchmarks that the technology companies can use to specify their concepts. "BEAST is used for tech scouting, we use it to evaluate available hardware and software stacks", Dr Herbert Huber, head of the High Performance Systems department at the LRZ, concludes: "The innovation partnership, however, is aimed at future technologies that will not be available on the market until 2025 at the earliest. But we were able to formulate requirements in advance and indicate what direction the development in the innovation partnership should take."

The innovation partnership between technology companies and LRZ is a very productive give-and-take: The companies can tailor technology closely to its intended use, and the LRZ serves as a provider of ideas and a supporter of technical developments. The LRZ learns about innovations earlier, can react to them with purchases, receives loans for BEAST - often as part of research projects - and even receives donations. "We get access to new technology earlier," confirms Huber, " we can cooperate more closely with manufacturing companies and co-develop or optimise technical innovations." (vs/ssc)