AI Scaling Laws Guide Building to Superhuman Level AI

Scaling laws are as important to artificial intelligence (AI) as the law of gravity is in the world around us. Cerebras makes wafer scale chips that are optimized for AI. Cerebras wafer chips can host large language models (LLMs).They are using open-source data that can be reproduced by developers across the world.

James Wang was an ARK Invest analyst. James is now a product marketing specialist at Cerebras.

James is interviewed about LLM development and why the generative pre-trained transformer (GPT) innovation taking place in this field is like nothing that has ever come before it (and has seemingly limitless possibilities). He also explains the motivation behind Cerebras’ unique approach and the benefits that their architecture and models are providing to developers.

Key Points From This Episode:

– Why Cerebras attracted James.
– James explains the concept of wafer-scale computing and why it is so advantageous in the AI space.
– A historical overview of large language model (LLM) development.

What James believes to be the most significant natural law that has been discovered in this century?

OpenAI found that the large language models had performance scaling across seven orders of magnitude. They made the AI models 10 million times bigger and performance scaled.

James believes this is the most significant law.

The anti-law to this was the Deepmind Chinchilla paper. This said that your LLM models are optimial with a ratio of 20 tokens per parameter. This was a hugely influential paper (March 2022). Instead of more and more parameters, there was a race to more and more tokens.

Why Cerebras wants to get state-of-the-art LLM data into the hands of as many people as possible.

Cerebras made all of the state of art AI (LLM) work open source.

– The Cerebras GPT law.

Cerebras has confirmed the transferability of scaling laws to tasks. This enables to determine how much compute and training is needed to achieve human or super human level performance. This also enables to design and load adequate AI performance into an iPhone, a laptop or an edge computing device.

– Looking towards the future of LLMs.
– Standard practice when it comes to training an LLM (and the problems that developers have been battling for years).
– The potential advantage of Cerebras CS2 chips and computers.

The Cerebras CS2 is optimized for the training problem. The Cerebras chips enable people to work and train on trillion parameter models without a bunch of problems and delays. This simplifies the training. They re-architected the wafer chips so that compute is independent of memory size. They can arbitrarily large language models without blowing up the chip. They can pair large compute with petabytes of memory.

This breaking of the connection between compute and memory is the concept of disaggregated architecture.

– How the Cerebras approach differs from the approach taken by other companies (NVIDIA and Dojo, for example).

– Cerebras offerings that are available to be used by the public.

– The current GPU cloud shortage (and why this adds to the appeal of Cerebras software).

– Why the progress being made in the GPT space is incomparable to developments that have come before it.

– Potential directions that the world could be heading in as a result of AI developments (and why James is optimistic about it all).

– The AI use case that is keeping James up at night.

James is not worried about runaway AI so long as the physical world is “air-gapped” from the digital AI. If AI gets recursively improving in the world of atoms then we should 100% be scared.

We need to Air gap more critical infrastructure.

4 thoughts on “AI Scaling Laws Guide Building to Superhuman Level AI”

  1. AI is a term used for way too many things. One of these involves it being used as a term for anything that is a workaround for actual intelligence (like artificial creamer is a non-dairy product that is meant to be a workaround for using actual cream as your creamer).

    A sufficiently advanced learning tree could conceivably have a response for pretty much everything and appear virtually indistinguishable from real person, except that, if were you to ask it “Is there anyone in there?” It’s honest response might be: “Nope, no one at home. All appearances to the contrary, I am no more a sentient being than a 1950s toaster. I’m sorry if that makes you feel uncomfortable.” To which you might ask: “You aren’t really sorry are you?” And it could reply: “No, it’s just part of the right response when appearing to discuss this with a human, according to my ruleset.”

    Another use of the term could refer to an intelligence, as good and versatile as our own, or even better, that is entirely synthetic in origin. Sometimes this is called strong AI or adaptive general intelligence (AGI).

    Regardless, advances in AI of all sorts teach us more about ourselves and will likely enable us to further augment our native abilities. It may become something of a staircase, with both of us enabling each other to ascend further.

    In such a world, if an augmented you could create a mind much like your own, or better, and help it to learn and appreciate your own values, while also helping prepare it for an independent existence, how would this differ from it being your own child? Humans rearing adopted children, as well as the children themselves, can tell you that DNA does not necessarily define a parental relationship.

  2. Perhaps unfortunately, the trend in “air gapping” so far has been to decrease it, especially via e.g. Starlink….

    Its all making me very uneasy and worried about my kids future, now on top of climate change, Putin etc etc…..

    • I don’t understand why people have multiple worries. Shouldn’t the only worry be around AI and if we do get AI right won’t it just solve all other problems.

      • I don’t think you can necessarily assume that. There will be limits to what AI can do, at least in the short to medium term.

Comments are closed.