Short paths between data processing and storage
Chip of the Cerebras CS-2 system: compute power and memory are close together, which accelerates machine learning. Photo: Cerebras Systems.
A face-to-face workshop at the Leibniz Supercomputing Centre demonstrated the potential of Cerebras Systems’s CS-2 system for science and research: In addition to training large language models and pattern recognition, it can also accelerate high-performance computing applications.
New technologies arouse curiosity but also require explanation: "The architecture of the CS-2 system is unique and new," says Michael Gerndt, Professor at the Department of Computer Science at the Technical University of Munich (TUM). The scientist is taking part in a workshop at the Leibniz Supercomputing Centre (LRZ) on the functions and possibilities of the CS-2 system from Cerebras Systems, a computer specialising in artificial intelligence applications that has been available to LRZ researchers for around two years thanks to Bavaria’s Hightech-Agenda. Gerndt is researching parallel systems and programming tools and wants to learn more about the system: "I want to use it in my lecture on computer architecture, and it can also be used for HPC applications beyond machine learning, which is also interesting for us.
Whether information about an innovative computer architecture or tips and tricks for working with it: In July this year, the Cerebras workshop brought together almost 30 researchers at the LRZ, including professors and doctoral students, who wanted to get to know the innovative supercomputer with the world's largest chip from the ground up and learn about LRZ's access modalities: “The aim of the workshop was, on the one hand, to build a community with researchers who can use the CS-2 system,” reports Dr Michael Hoffmann from the LRZ's Big Data & Artificial Intelligence (BDAI) team. “We were also able to learn a few tricks and strategies for using it ourselves.”
Highly integrated chip
The CS-2 system differs significantly from the LRZ's supercomputers and is particularly suited to training large language models: “Large AI models don’t fit for Graphics Processing Units, developers have to break the models down very heavily to distribute them across hundreds of GPUs”, explains Gokul Ramakrishnan, technical lead of Cerebras Systems and the lecturer at the workshop. “The model has to be rewritten for use on a cluster.” This is not necessary with the CS-2 system, as its 46-square-centimetre chip integrates some 2.6 trillion transistors into 850,000 computing cores, as well as memory capacities of up to 40 gigabytes per core. While many classical AI clusters load data block by block onto processors, process it and write the results back to memory, on the gold-coloured Wafer Scale Engine (WSE) it can flow from core to core at a speed of 20 petabytes per second. Processing units are near at on chip storage, this speeds up data transfer and deep or machine learning.
Profile: Wafer Scale Engine-2 Cerebras
850,000 cores optimized for sparse linear algebra
46,225 mm2 silicon
2.6 trillion transistors
40 gigabytes of on-chip memory
20 PByte/s memory bandwidth
220 Pbit/s fabric bandwidth
7nm process technology
The technology is also suitable for High Performance Computing (HPC) applications: "Projects that require high memory bandwidth and benefit from having working data stored close to the processing cores can be efficiently executed on the CS-2", says LRZ researcher Jophin John, also from the BDAI team. “The first HPC projects using the technology are currently underway at the LRZ.”
Model Zoo and software for custom applications
For the development of own models and programmes, a separate software stack based on the PyTorch programming language is implemented on the innovative system, as well as a selection of widely used AI models such as Bert, Llama, Mistral and important transformers, the Model Zoo. The workshop focused on how researchers can use these to code or adapt existing models to their individual needs, and included small exercises. Participants learned how to implement models on the CS-2, prepare training data and process it efficiently. “When pre-training large language models, it is crucial to first experiment with smaller versions of the data set. This allows to determine hyperparameters for the training, which enables to be carried out in an energy-efficient manner", recommends Hoffmann. “If the custom models from the Model Zoo are used, the CS-2 is user-friendly, implementing own custom models can become a challenge that we support users with.” If necessary, Cerebras Systems will adapt new models also to its CS-2 system at LRZ, and the BDAI team can also invite users to Cerebras' office hours, which take place every Tuesday from 4pm for consulting and discussing questions.
“Computing resources are always in high demand in research. The promise of being able to train large AI models from scratch with the CS-2 attracted me to participate,” said Dr Niki Kilbertus, Professor at the TUM School of Computation, Information and Technology. “In particular, projects where we cannot easily reuse or fine-tune existing large language models, such as models for microbiome DNA data, could benefit greatly from Cerebras.” Seems that the workshop will bring new projects to the LRZ. (vs)
The Model Zoo for the CS-2 The software stack for the CS-2 is based on PyTorch. Cerebras Systems has also implemented now common AI models and transformers on the CS-2, such as: Bert, Bloom, Codegen, Dpr, Falcon, Flan-ul2, GPT (2,3,4, j, neox), Llama, Llava, Mistral, Mpt, Octocoder, Roberta, Santacoder, SqlCoder, T5, Transformer, UI2, Wizardcoder. To the models: |