AI Model Trained With 174 Trillion Parameters

The BaGuaLu AI system used the Chinese Sunway exaflop supercomputer to train the largest AI model with over 174 trillion parameters. The miraculous capabilities of neural net AI systems like ChatGPT (AI generate novel text and stories) and Dall-E (AI generate novel pictures) and Alphafold2 (protein folding) comes from the growth of the AI models. Going to 100 trillion parameters means you can do things like take all of the text data of the internet or all of the pictures or all of the videos and learn from those massive datasets.

In the the size of the largest deep learning models has grown enormously. Within the field of natural language processing, the largest models have gone from having 94 million parameters in 2018, to 17 billion parameters in early 2020. Microsoft in February of 2020 created an AI model with 100 billion parameters.

Metaculus the public future predictions site had a question set up in February of 2020 asking if a 100 trillion parameter deep learning system would be trained by 2026. I was at 99% certainty that it would happen. The BaGuaLu model actually published March 2022 with the 174 trillion parameters trained. Unpublished systems are probably already in the 200-500 trillion parameter level.

Advances in neural models such as the 2020 Reformer system have enable the ability to train large models that use memory much more efficiently. 100 trillion parameters is considered by some to be the median estimate of the number of synapses in a human neocortex.

Large-scale pretrained AI models have shown state-of-the-art accuracy in a series of important applications. As the size of pretrained AI models grows dramatically each year in an effort to achieve higher accuracy, training such models requires massive computing and memory capabilities, which accelerates the convergence of AI and HPC. However, there are still gaps in deploying AI applications on HPC systems, which need application and system co-design based on specific hardware features.

To this end, this paper proposes BaGuaLu, the first work targeting training brain scale models on an entire exascale supercomputer, the New Generation Sunway Supercomputer. By combining hardware-specific intra-node optimization and hybrid parallel strategies, BaGuaLu enables decent performance and scalability on unprecedentedly large models. The evaluation shows that BaGuaLu can train 14.5-trillion-parameter models with a performance of over 1 EFLOPS using mixed-precision and has the capability to train 174-trillion-parameter models, which rivals the number of synapses in a human brain.

Tesla Dojo and Others Racing to 100 Exaflops and 100X to 1000X the BaGuaLu System in 3-5 Years

Tesla is looking to mass produce the Dojo AI training system.

Tesla has made sure that the IO (input output), memory, software, power, cooling and all other aspects of the system are perfectly scalable. This will enable them to just build and add tiles to scale to the Exaflop level and beyond.

About 120 compute tiles would equal 1.1 Exaflops and 120,000 compute tiles would be 1100 Exaflops (or 1.1 Zettaflops).

This AI training will be used to perfect self driving and to train the Teslabot.

Google, Facebook, Amazon, Microsoft, Nvidia and other big technology companies are racing to make larger and more capable AI and AI training systems.

3 thoughts on “AI Model Trained With 174 Trillion Parameters”

  1. That many parameters, or that many data points? If it truly is parameters, how many data points per parameter on average. Parameters relate to the breadth of the AI, but data points relate to the depth, which relates to theoretical possible accuracy.

  2. I have been using ChatGPT for a few weeks now, and I can tell that it is really easy to get used to it.

    It’s clearly not sentient or whatever, the illusion is good but it peels away as you use it more often and learn its limits, but it easily accommodates itself in the role of patient tutor on arcane topics and a useful assistant, if you want to perform programming or automation tasks.

    To the point that you miss it when it’s gone, as it recently happened to thousands more in one of the periodic outages, more than just for the missing functions. I imagine that’s the feeling of our brains getting a dopamine high from having a “team” that solves problems together, and a low when the peers are gone.

    We can indeed get addicted to these companions, specially if they provide meaningful answers and do help.

    And this is with GPT3, which will soon be passé with GPT4 coming this spring. I shudder what kind of thing the CCP can do, twisted to be compliant with whatever versio of reality they like.

Comments are closed.