Three Eras of Machine Learning and Predicting the Future of AI

Compute, data, and algorithmic advances are the three fundamental factors that guide the progress of modern Machine Learning (ML). Researchers studied trends in the most readily quantified factor – compute.

They show :
before 2010 training compute grew in line with Moore’s law, doubling roughly every 20 months.

Deep Learning started in the early 2010s and the scaling of training compute has accelerated, doubling approximately every 6 months.

In late 2015, a new trend emerged as firms developed large-scale ML models with 10 to 100-fold larger requirements in training compute.

Based on these observations they split the history of compute in ML into three eras: the Pre Deep Learning Era, the Deep Learning Era and the Large-Scale Era . Overall, the work highlights the fast-growing compute requirements for training advanced ML systems.

They have detailed investigation into the compute demand of milestone ML models over time. They make the following contributions:
1. They curate a dataset of 123 milestone Machine Learning systems, annotated with the compute it took to train them.
2. They tentatively frame the trends in compute in terms of three distinct eras: the Pre Deep Learning Era , the Deep Learning Era and the Large-Scale Era . They offer estimates of the doubling times during each of these eras.
3. They extensively check their results in a series of appendices, discussing alternate interpretations of the data, and differences with previous work

They studied trends in compute by curating a dataset of training compute with more than 100 milestone ML systems and used this data to analyze how the trend has grown over time.
The findings seem consistent with previous work, though they indicate a more moderate scaling of training compute.
In particular, they identify an 18-month doubling time between 1952 and 2010, a 6-month doubling time between 2010 and 2022, and a new trend of large-scale models between late 2015 and 2022, which started 2 to 3 orders of magnitude over the previous trend and displays a 10-month doubling time.

One aspect they have not covered in this article is another key quantifiable resource used to train Machine Learning models — data. They will be looking at trends in dataset size and their relationship to trends in compute in future work.

14 thoughts on “Three Eras of Machine Learning and Predicting the Future of AI”

  1. I would have divided history a bit differently.

    Before 2013, most ML was done using statistical machine learning that employed CPUs for learning. There was some neural network being done prior to 2013 using GPUs.

    In Dec 2012, the Alexnet paper was published. That changed everything. Most of ML research transitioned to using deep learning and GPUs for training.

    In 2017, the paper Attention is All You Need was published. That paper introduced the transformer model of deep learning. Again, everything changed, and most large and successful models in deep learning today use transformers. They are also trained using GPUs, but since they are inherently more efficient in training, very large models and data sets became feasible.

    There are a few exceptions. For example, DeepMind created AlphaGo, and from there AlphaGo Zero and then Alpha Zero. They used a combination of deep learning, reinforcement learning and stochastic tree search. The Zero systems don’t really belong on the charts, since technically they didn’t use external training data. Instead, they learned to play games like Go and Chess at superhuman levels by playing themselves.

  2. Humanity has not figured out yet that human machine has to be restricted as it has already encroached too deeply to human to human interace.

  3. Not being familiar with the usage of “compute” as a noun (attempts to define it which I found on the web seem kind of nebulous), I’m a bit confused as to whether this is a good thing. Is it saying that it takes more and more computing resources to train a machine learning system so that it can reach new milestones or is it saying that the the increasing availability of such resources is driving these new milestones more often?…

    • “Is it saying that it takes more and more computing resources to train a machine learning system so that it can reach new milestones or is it saying that the the increasing availability of such resources is driving these new milestones more often?…”

      I read it as both. The problem of intelligence has been leashed to another problem – the availability of ‘compute’. That problem of compute has a vast infrastructure of resources behind it’s advancement, so in a way that’s good, but if you want intelligence to advance faster than our ability to increase compute, well, better work hard on the other two facets of the triad.

  4. Not being familiar with the usage of “compute” as a noun (attempts to define it which I found on the web seem kind of nebulous), I’m a bit confused as to whether this is a good thing. Is it saying that it takes more and more computing resources to train a machine learning system so that it can reach new milestones or is it saying that the the increasing availability of such resources is driving these new milestones more often?

    If advancements in computing power are growing at a slower pace from the historical trend line over the last few years—which I recall seeing on a graph of the top 200 supercomputers over time—will the interval between new milestones start to increase?

    • They are using `compute` to say the amount of pure computer power needed or used

      Quote ..Training compute (FLOPs) of milestone Machine Learning systems over time .. Unquote
      from above the graph above

  5. The next era of machine learning won’t take off until the Big Boys understand the answers to the Hutter Prize FAQ. Of course, this goes back to the wrong turn taken in the philosophy of science when Solomonoff’s papers on induction were drowned out by Popper and Kuhn’s popularizations of “the philosophy of science” that elevated “falsification” to a dogma while ignoring algorithmic information theory. That’s over a half century of set back not only in AI but science by those two “geniuses”.

    • Have you read On Intelligence, or, more recently, A Thousand Brains, by Hawkins? My money (some of it at least) would be on his work.

      But so far as Solomonoff’s work versus that of either Popper or Kuhn? I do see potential value in the work of the former but I don’t believe that the latter two have done any damage, at least not so far as regards machine learning, as I can’t imagine any seriously at work within the field seeing the latter two as providing any sort of model for predictive cognition.

  6. ML is part of AI but all of AI. That distinction tends to get glossed over nowadays and leads to the general public being confused about what AI is capable of. That is huge improvements in ML does not mean that the AI in general has improved equally.

Comments are closed.