Ten Exaflop European Supercomputer in 2021

The European supercomputer Leonardo, managed by Cineca, that will be installed at the end of 2021 in the new data center located in Bologna, will be based on Atos BullSequana XH2000 technology, and feature nearly 14,000 NVIDIA Ampere architecture-based GPUs and NVIDIA Mellanox HDR InfiniBand. It will deliver 10 exaflops of FP16 AI performance.

It will be based on Atos BullSequana XH2000 technology and use 14,000 next generation NVIDIA Ampere architecture-based GPUs and NVIDIA Mellanox HDR InfiniBand.

Capable of an aggregated HPL performance of 250 Pflops (250 trillion operations per second) and equipped with over 100 petabytes of state-of-the-art storage capacity, the system will provide 10 times the computing power of Cineca’s current top tier system Marconi100, which is currently ranked in the ninth position on the global TOP500 list of the world’s most powerful supercomputers.

Leonardo is the first of three pre-exascale systems announced by EuroHPC, a collaboration between national governments and the European Union. Funded by the European Commission and by the Italian Ministry of Universities and Research, EuroHPC’s aim is to develop a world-class supercomputing ecosystem and exascale supercomputing in Europe. The other pre-exascale class systems, that will join the Italian Leonardo will be installed in Finland and Spain.

A 500 PetaFLOP Luxembourg-based MelaXina system will be used for financial services, manufacturing, and health care applications. It will connect 800 Nvidia A100 graphics cards on HDR 200Gbps InfiniBand links. The new Vega supercomputer at the Institute of Information Science in Maribor, Slovenia, it will include 240 A100 graphics cards and 1,800 HDR 200Gbps InfiniBand endpoints. The IT4Innovations National Supercomputing Center will host what’s expected to become the most powerful supercomputer in the Czech Republic. It will be an Apollo 6500-based system with 560 A100 graphics cards to deliver nearly 350 petaflops of performance for academic and industrial simulations, data analytics, and AI.

Technical Information
Leonardo will be built from Atos’ BullSequana XH2000 supercomputer nodes, each with four NVIDIA Tensor Core GPUs and a single Intel CPU. It will also use NVIDIA Mellanox HDR 200Gb/s InfiniBand connectivity, with smart in-network computing acceleration engines that enable extremely low latency and high data throughput to provide the highest AI and HPC application performance and scalability.

Leonardo will feature nearly 14 000 NVIDIA Ampere architecture-based GPUs. It will deliver 10 exaflops of FP16 AI performance. NVIDIA Ampere architecture GPUs can accelerate over 1,800 applications such as Quantum Espresso for material science, SPECFEM3D for geoscience and MILC for quantum physics by up to 70x, making previous big challenge simulations almost real-time tasks.

3rd Gen Intel Xeon Scalable processors (Ice Lake) are optimized to perform computationally intensive workloads in high-performance computing systems like Leonardo. The follow-on processor to Intel’s Ice Lake server processors is Sapphire Rapids, which will enable exascale computing with advanced built-in AI acceleration capabilities.

More than 136 BullSequana XH2000 Direct Liquid cooling racks
250 PFLOPs HPL Linpack Performance (Rmax)
10 ExaFLOPS of FP16 AI performance
3456 servers equipped with Intel Xeon Ice Lake and NVIDIA Ampere architecture GPUs
1536 servers with Intel Xeon Sapphire processors
5PB of High Performance storage
100PB of Large Capacity Storage

3 Modules
5000 computing nodes
150 PB I/O
150PB of storage
1TB/s bandwidth
200Gb/s interconnection bandwidth
PUE 1,08
240Mln € investment
1500+ square meter footprint

Written By Brian Wang, Nextbigfuture.com

11 thoughts on “Ten Exaflop European Supercomputer in 2021”

  1. Copyright and patents are really quite different things.

    Though I've heard of some attempts to try to "copyright" some code, I was under the impression that these haven't stood up in actual courts when challenged.

  2. For those having a big enough lifespan and talent, it is theoretically possible: https://en.wikipedia.org/wiki/List_of_countries%27_copyright_lengths

    Life + 70 years (works published since 1978 or unpublished works)

    Actually it would be "capitalism" only if intellectual property rights were without expiration and monopoly.

    The advantage of non-government R&D is that skeptics are not obliged to spend their money on projects which they deem unfeasible. But when the technology turns out to be feasible and its product is ready, then each consumer must pay a fair share [1] of its IP costs besides whatever else. And those IP costs must include 50-100% annual interests for every R&D expense. Delays must not be taken into account when calculating those interests, of course. Only time which is essential for the R&D of the product. So proponents of some technology – self-driving cars, for instance – may help engineers by buying shares of the IP on which they work with some discount and thus have a chance to sell those shares at higher prices later, if the technology succeeds on the market and consumers need to buy them.

    [1] Fair share of IP costs for a consumer is a ratio of the quantity of products which have been made for that specific consumer using that IP to the quantity of all products which have been made using that IP for everyone. Of course, if the number of consumers is too large, it is more convenient for them to pay that fair share through retailers…

  3. How are you getting a patent lasting for 150 years? Or is this assuming some completely new legislation being passed that isn't being seriously suggested by anyone?

  4. The thing that made "AI" useful recently compared to 10+ years ago is mostly capacity and performance improvements. The algorithms are mostly the same but the models are more complex. One often needs a million data points (training examples) with proper annotation and correct expected output for the model. This is the major bottleneck in the industry. "Kids learning" works the same. We learn by examples and get feedback on our output. We adjust the networks (brain) and repeat examples until output is satisfactory.

    The weighting of the brain network nodes is done when we sleep and dream. That's why we often wake up having solved problems "in our sleep".
    Kids do take a lot of calendar time to learn everything they need. Decades of iterations with training by examples-feedback-sleep.
    Computers can probably do some things a little quicker as demonstrated by that GO-game software.

  5. It's sort of incompatible with short term profitability (capitalism)

    Actually in the current state of the USA legislation regarding intellectual property some nefarious people manage to get monopoly over key patents impeding development in many branches of technology. Just look at that funny example of patent trolling of Samsung by Apple. And more serious cases must exist: in pharmacy, energy generation, etc.

    So a monopoly for 20 or even 150 (!) years hardly could be called "short term". Also how could it be "capitalism", if only few corrupt own that capital? If those few were not even Americans, that would be called the occupation by fraud. Oh, wait…

    Regarding government. Federal and local governments of the USA take more than $6 trillion from their taxpayers every year! If 0.1% of that money was left to the people, they could invest in the intellectual property regarding AI more than $6 billion. Every year! Do you know what had made a 20 years old guy, Thomas Sohmers, for a mere $2 million grant given to him? A processor which is 6-7 times better than a processor from Intel. Search for Rex Computing Neo 256-cores processor.

    So Americans have more than enough wealth to accelerate R&D in all directions simultaneously. Corrupted news outlets and bankers just do not tell that to them. And, of course, IP legislation must be made to limit prices established by monopolies.

  6. I think that we can solve the problem with anotating the data easier if the amount of required data is reduced by a factor 100x – 1000x. A kid learning to read does not have to train on the letters more than a handful of times, perhaps 100 times or so? And after reading perhaps 50000 words (100 A4-pages), a kid is already pretty proficient.

    So if we get to this level of "learning efficiency" then annotating data will be a cinch…

    Furthermore, most of the data kid encounters is not annotated and still they make sense of it. I don't see why an ANN could not use the same (unknown) strategy to "auto-sort"..

  7. Other than raw FP compute capacity, the big problem is the complexity of reality and how to feed the models with labelled training data. After all, it takes about three decades to train a human being into excellence in a domain, i.e. a scientist being able to discover new things previously not known. Maybe scratch a few years of basic knowledge from that time budget for things not needed by a machine.
    To some degree, these decades of training data can be reused once developed but it has to be done in the first place at least once and then made available.
    I don't know if anyone is working on that long term problem. It's sort of incompatible with short term profitability (capitalism) so focus naturally becomes applications in limited domains with a predictable timeline. However, most often, the short term capitalism mimics evolution good enough so the result may well be a faster long term development as well. It will still take a lot of calendar time.
    We humans are always in a hurry because our productive life span is so short. Perhaps we change strategies a bit when we can live 10 times longer.

    Another possible outcome is that the human brain / machine interfaces being developed turn out to work really well. The tight integration between our brains and external systems could maybe be used to speed up training and all things AI by orders of magnitude.

  8. I'd like to point out that one exa-operations per second is an upper bound of what the human brain could possibly perform. So, with the correct programming, this computer could be made smarter than a human… Provided the memory bandwidth is sufficient to make effective use of the flops…

    I predict that there will be a breakthrough in reinforcement learning that reduces the required amount of learning data, and then all these already installed and payed for computers will suddenly be capable of super human intelligence. Kind of like the situation that we had with chess software.. "Stockfish" can beat any human chess champion as well as earlier programs running on supercomputers, such as "Deep blue". "Deep blue" had a rating of about 13 Gflops, which is actually more than today's mainstay tablets. A tablet is all you need for "Stockfish", so it's not a questions of raw flops.

    This breakthrough may be two years away, five years away or perhaps 10 years away. But I really cannot imagine any reason why we should not crack the question of how to make more effective learning that is on par with human learning with respect to training data…

Comments are closed.