Compute, data, and algorithmic advances are the three fundamental factors that guide the progress of modern Machine Learning (ML). Researchers studied trends in the most readily quantified factor – compute.
They show :
before 2010 training compute grew in line with Moore’s law, doubling roughly every 20 months.
Deep Learning started in the early 2010s and the scaling of training compute has accelerated, doubling approximately every 6 months.
In late 2015, a new trend emerged as firms developed large-scale ML models with 10 to 100-fold larger requirements in training compute.
Based on these observations they split the history of compute in ML into three eras: the Pre Deep Learning Era, the Deep Learning Era and the Large-Scale Era . Overall, the work highlights the fast-growing compute requirements for training advanced ML systems.
They have detailed investigation into the compute demand of milestone ML models over time. They make the following contributions:
1. They curate a dataset of 123 milestone Machine Learning systems, annotated with the compute it took to train them.
2. They tentatively frame the trends in compute in terms of three distinct eras: the Pre Deep Learning Era , the Deep Learning Era and the Large-Scale Era . They offer estimates of the doubling times during each of these eras.
3. They extensively check their results in a series of appendices, discussing alternate interpretations of the data, and differences with previous work
They studied trends in compute by curating a dataset of training compute with more than 100 milestone ML systems and used this data to analyze how the trend has grown over time.
The findings seem consistent with previous work, though they indicate a more moderate scaling of training compute.
In particular, they identify an 18-month doubling time between 1952 and 2010, a 6-month doubling time between 2010 and 2022, and a new trend of large-scale models between late 2015 and 2022, which started 2 to 3 orders of magnitude over the previous trend and displays a 10-month doubling time.
One aspect they have not covered in this article is another key quantifiable resource used to train Machine Learning models — data. They will be looking at trends in dataset size and their relationship to trends in compute in future work.
Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.