How Much Compute and Video to Solve Real World Superintelligence ?

Yann LeCun is both a giant in the field of AI but also a major skeptic of the ultimate potential of large language models. He is a major skeptic about large language models getting to TRUE AI (aka real AGI or actual superintelligence). He feels the LLM AI will not be able to learn the physics of the real world.

Yann LeCun is a French computer scientist regarded as one of the fathers of modern deep learning. In 2018, he received the Turing Award, often called the “Nobel Prize of Computing.” He is currently a professor at New York University and Chief AI Scientist at Meta (formerly Facebook), where he continues his research on machine learning algorithms. His work underpins today’s AI landscape, influencing technologies such as speech recognition, satellite image analysis, and recommendation systems.

He feels the neural networks of large language models will fail to go the next step. He thinks Tesla FSD will fail to solve robotaxi. This is despite thousands of Tesla cars driving without drivers from factory to loading dock in Fremont every day. Day or night. Rain or shine. There are also hundreds of thousands of Tesla driving without human drivers up to 85 meters for actual smart summon (parking to summoner) or dispatch summoner to parking spot. Summons and dispatches are happening every day in the US, Mexico, Canada and now China.

In June 2025, Tesla robotaxi breakthrough should see hundreds of thousands of cars without human drivers giving paid rides in Austin and then spreading around the world to millions of cars in 2026.

It took 6 billion miles of driving data to reach this point. A mile per minute of video. It is one minute to drive a mile at 60 miles per hour. Two minutes at 30 miles per hour. 6 billion miles of driving data is about 11000 years of driving data.

Yann LeCun (AI Legend but LLM AI Skeptic) says it takes 20 hours for a teenager to learn to drive. However, it takes 12 years of video data to train a child to get ready to learn to drive. Yann discusses how it takes a child 4 years to learn the basics of the world. So he knows it takes time to get real world knowledge and an internal world model before the learn to drive step can happen.

This means Tesla AI and FSD is 1000-10000 times less efficient than humans learning to drive. Although learning to drive perfectly (like the robotaxi goal of over 10 times safer than human) probably takes a human 5-10 years. Humans might never reach ten times safer than the average human. The near perfect driving standard means 500 times less efficient for AI. Let us assume 1000 times less efficiency for LLMs. ~50,000 years of video training data should be enough to train the real world and most real world learning of any kind that humans do.

This inefficiency could end up not mattering. IF we gather 100,000 years or millions of years of video to learn the real world. We then 1500x more compute than was used in Jan 2025 in about two to three years to process it. I think this will occur around 2027-2029. We use millions of cameras to gather data for 2-3 years, this will enable the collection of millions of years of video to train.

I have tracked the construction of the xAI data center in Memphis. I know xAI is adding gas turbines which will enable 1.2 GW of power around the end of this year. This will be in range of powering 1 million GPUs (Nvidia B200s). There are many better chips coming.

I have projected with rack density it is possible to put 3 million chips into the Memphis xAI building. There needs to be a tripling of power delivered to the building. This is also possible. This could be a 2027-2028 completion. 30X more chips than was used for Grok 3 (100k H100). Using Nvidia Ultra or Dojo 4 chips 100X better than H100. Training for 3 times longer. 9000X more compute. Other improvements to networking, memory, AI models.

This supercluster will need to process 20 quadrillion tokens of video and other data.

Tesla plans to make 1 million Teslabots in 2027. There will be 5000-10,000 this year. There should be 50,000-100,000 in 2026. This would mean 100,000 humanoid bot years of data and video for those 2026 bots.

13 thoughts on “How Much Compute and Video to Solve Real World Superintelligence ?”

  1. Will people please stop using the words “superintelligence”, or “singularly”, or “infinity” ,until one can actually define what any of those words mean? Last time I checked, no one has. If I’m wrong, please someone Yell at me and let me know. If it’s this Earth shattering, feel free to yell at me. I do scan multiple scientific journals. If I didn’t notice answer’s to what these terms actually mean, I am SERIOUSLY stupid. Embarrassing, but if true? Please, just let me hide under a rock.. A big one.

    • As Charlie Munger, one of the wisest people who ever lived, said, it’s better to be directionally right than precisely wrong. There is no specific and verifiable definition of these terms because nobody knows what they mean in detail but we all stand what we are directionally talking about. Live with it. If you are waiting for this to be covered in scientific journals with dozens of PhD students working on it you would have missed the boat.

      Fun topic: find the definition of “Marketing” as accepted by all academics and industry.

  2. Hopefully we can brute force a inefficient AGI that will teach us how to build a more efficient neuromorphic AI.

  3. Reading your latest articles on AI, it seems that the only factor that explains intelligence is computing power. For something as complex as intelligence, this seems very simplistic.

    • You can also make a fission bomb by putting together enough enriched Uranium 235 or Plutonium. You get a nuclear explosion. Put together the right compute with enough energy and there is an intelligence explosion. Sometime things are simple for a general overview. Actually enriching uranium is hard. Actually making the right chips and the right models was hard.

      • That’s not a good anaology–an explosion is a gigantic increase in entropy/loss of order. Sudden intelligence is an even more gigantic increase in order.

      • If I may buddy, there’s a big difference between building a bomb, and insight. Put a computer that’s “very fast”, with access to “lot’s” of data, Yes it can work so fast, it gives the impression it’s “self aware”. Hey, IMO, being quick on the draw, does not mean you know when to draw in the first place. Biology is (sorry) slow. when it comes to being a calculator. But it’s immensely robust. Why a virus or bacteria that did attack you, you never noticed. That “kind of thing”, happens literally several millions of times a day. Notice it? Very rarely. As I said living things are intensify robust. Like your person, for which I’m grateful.

        Yes, I do understand “Oxams Razor” The most simplest explanation is usually correct. But the “simple, easiest” answer to any question, may not be right, or correct. You have to ask the right questions, to get the right answers. By the way, enriching uranium is NOT hard. (I’m not a physicist). But I DO know, it only takes enough time and energy, to make it happen. Unfortunately I know I’m not the only guy who knows how to do this. And it’s NOT my field. It’s not a “field” I want anything to do with. I love to invent things, like transparent solar cells (think windows), and self-repairing surfaces (run into a pot hole lately?). Hopefully, in the future, your grandpa will bitch about potholes. and your kids, will have no idea what he’s talking about. If I can make our world “less complicated”, for people to enjoy their lives, what joy…

        Sorry, I do talk to much. Thanks for putting up with me.

  4. Available training data has been exhausted. Present efforts are to refine the training on existing data. The need to train and the market for training chips will collapse in two years. Only humans can expand the data set by creating new data using the experimental method.

    • I disagree. R1 uses synthetic/expanded data to train its reason ability. A problem with a known answer is expanded into 20 versions, and the network is trained with reinforcing learning to solve them. This way, the R1 network discovered that it pays off to reason longer.

      Who is to say that it this method cannot be expanded further?

      • Some research has been done to show that when AI is used to generate data and it is fed back into the system it pollutes or poisons the data set and becomes increasingly unreliable. Apologies but unable to find the reference for this at the moment as short on time. Perhaps somebody could help out?

      • AI generated synthetic data can pollute the training data leading to more errors. It is called “model collapse”. There have been a number of articles published as well as a paper in Nature.

      • Humans use synthetic data. We call it imagination, hallucination, creativity. Our imagination is constantly proposing and our rational function is constantly disposing. A lack of rational disposing power leads to schizophrenia. Imposing imagination on AI adds a disposal burden and risks AI schizophrenia.

Comments are closed.