"We Achieved a Transfer Rate of 7 Terabit per second"
In former days knowledge was written in books, in our times its copied on tapes and in tape libraries. Photo: W. Bauer/LRZ
At night, in the data and archive room of the Leibniz Supercomputing Centre (LRZ), the work of the robots begins with a whirring sound: with their gripper arms, they pull tapes out of the compartments of the libraries, store on them the 40 million backup copies that around 5,000 computer systems in the Munich Scientific Network (MWN) send every day, and put them back again. 25 years ago, the LRZ centralised data storage - Werner Baur, who launched the Archive and Backup System (ABS) and watched it grow, describes changes and challenges. Most important observation: Although hardware has changed a lot over the past decades and capacities have grown enormously, little has changed in the principle of backups and storage.
Werner Baur, specialist for storage and keeper of the data
treasure from LRZ
In 1996, the LRZ set up the archive and backup system: its importance grew and with it the technical dimensions. Why? Werner Baur: Because the importance and the amount of data grew. Data is becoming more and more important in our society, it must not be lost. That was true 25 years ago and it is even more true today. Today, the security measures are correspondingly multi-level, at the end of which, as the "last line of defence" so to speak, is our archive and backup system. In science, there is an additional factor - measurement data should remain available in the long term. Research results should be verifiable at any time, they can be further evaluated and combined with new data.
How is the ABS structured? Baur: We started in 1995 by setting up two servers, a small tape library and four tape drives, which have been in operation since 1996. We were able to fit about 10 terabytes of storage space on two square metres of floor space. Today, 20 servers, 5 tape robots with over 70,000 slots, 126 tape drives and more than 2,300 hard drives fill one floor in the computer cube. We now also have somewhat more capacity than back then, with 125,000 terabytes. Although the hardware has changed a lot, the service still runs with the same concept and the same software from IBM. Only the name has changed more often. In the beginning, the software was called Adstar Distributed Storage Manager, ADSM, then Tivoli Storage Manager, TSM, and now Spectrum Protect, ISP.
How does the backup work? Baur: LRZ users get an ID, download the ISP client, which then scans their systems for new or changed files and usually transfers them to the LRZ as a backup at night. Archiving is different - LRZ users themselves configure how long research, administrative and work data is stored. The standard archiving period is 10 years.
125 petabytes of storage capacity - how much of that is used? Baur: Around 110 petabytes are stored, distributed over 50,000 tapes. At the moment, about 150 terabytes are added every day. So that won't last much longer. We are compressing a lot, shifting, and the system will be expanded considerably in 2022.
What is the greater challenge - technical or user error? Baur: That depends on the perspective. A common mistake is to set up the backup and regularly back up data, but never run updates and do a restore test. If the worst comes to the worst years later, if the hard disk or server is broken and data has to be restored from the ABS, it often goes wrong because the old TSM version no longer works. This becomes a technical challenge, but we were still able to save data a few times. For us as operators of the ABS, the biggest task is probably to keep up with the version cycles and to pull the data along.
New computers again and again, the move to Garching in 2006: What do you need to move data? Baur: First and foremost, time. It took us about six months to migrate the data for the first time in 1995/96, when we moved about one terabyte of data to the ABS. It will take us a similar amount of time now to transfer 40,000 terabytes from the SuperMUC-NG archive to the new Data Science Archive, or DSA for short. It was faster in 2006 when the LRZ moved to Garching. You wouldn't think so in the age of high-speed networks - but the fastest method was to transport all the tapes by truck instead of transferring the 1.5 petabytes of data over the network. With the lorry, we achieved a transfer rate of 7 terabits per second - by the way, we still can't do that with current networks.
Why is data recopied? Baur: Because the tape material ages and data can become unreadable. For security reasons and because the new media are faster and have a much higher storage capacity, we replace them every 5 to 7 years. This hurts my soul, because the old tapes cannot be reused for data protection reasons. Between migration cycles, the system checks the data on the tapes for errors. In addition, copies of our archive data sets are stored in another data centre. This way we are prepared for disaster.
Tapes and hard disks have often been pronounced dead - what will data backup look like in the future? Baur: The differences in transfer speeds and capacities are becoming smaller and smaller. But for very large amounts of data, tape will remain the most economical storage medium for a long time. Companies like Amazon, Google and Microsoft rely on it - a guarantee that the technology will continue to develop. One thing is certain - data backup remains indispensable. Even if primary storage media guaranteed an absolute, unlimited shelf life, backups would still be needed as protection against cybercrime or human error. (vs)
Data, data, big data: the archive- and baxkup system of LRZ is always growing strongly
|
1995 |
2021 |
Storaged Data |
1 TB |
110.000 |
Number tape drives |
4 |
126 |
Number tape libraries |
1 |
5 |
Storagecapacity/day |
0,01 TB |
12 |
Writing speed of tape drive |
9 MB |
360 MB |
Roboter arms of tape library in action. Photo: W. Baur/LRZ