IF OpenAI ChatGPT Cannot Generate Unbiased Truth Then How Will Sora Generate the Physical World?

There have been many major flaws found with OpenAI ChatGPT and Google Gemini large language models. They generate information that is far from the unbiased truth. The new image and video generation system have trouble with truth but they also have trouble with the physical world. They do not have an internal model of the physical world. This causes movement to be wrong. What should be physical objects just appear and disappear.

OpenAI has said they will be able to use the video generation system to produce real world training data. The implication is this training data would be good for self driving cars or for humanoid robots. However, if the video is hallucinated and not aligned with real world physics, this would be terrible for producing vast amounts of training data which could not be verified by human testers.

Gary Marcus founded Geometric Intelligence in 2014 and the machine learning company was later acquired by Uber. Gary is astonished that some people still haven’t recognized that generative AI has big when it comes to world models, high level reasoning and factuality

He points out that pictures and videos from OpenAI Sora may look photo realistic but they make fundamental mistakes. A video of a monkey playing chess has a realistic look for the Monkey but the chessboard is 7X7 and not 8X8. There are three kings.

There is a six-fingered man holding a unicorn and the unicorns horn is piercing his head.

The resolution of the images and video how OpenAI is using 16X compute.

There were other realistic looking videos of a women walking in Tokyo but the women takes to left steps in a row.

Marcus and others point out the problems for the supposedly imitation of reality has with basic physics. The actions seem to be mimicking the Unreal game engine. Training data with real world video would not have things breaking physics.

Because the glitches don’t stem from the data, they stem from a flaw in how the system reconstructs reality. One of the most fascinating things about Sora’s weird physics glitches is that most of these are not things that appear in the data. Rather, these glitches are in some ways akin to large language model hallucinations artifacts from (roughly speaking) data decompression, also known as lossy compression. They don’t derive from the world.

More data won’t solve that problem. And like other generative AI systems, there is no way to encode (and guarantee) constraints like “be truthful” or “obey the laws of physics” or “don’t just invent (or eliminate) objects.”

Space, time, and causality would be central to any serious world model.

Sora is fantastic, but it is akin to morphing and splicing

16 thoughts on “IF OpenAI ChatGPT Cannot Generate Unbiased Truth Then How Will Sora Generate the Physical World?”

  1. The so-called Artificial “Intelligence” is facing its fundamental flaw, that is a purely quantitative approach, based on training on vast amounts of data from which it extracts probabilistic rules. But correlation is not understanding, it’s just a kind of classification. In my opinion, the present day Large Language Models are a dead-end, as long as they not include the logic and analytic rules the human use.

  2. What all these failures mean is that AIs need to be trained more. Each failure is an opportunity for an AI to learn something new. When this does not happen, but people rather use it to blame an AI for being wrong, then the people themselves have not yet accepted AIs for being learning machines. Just pointing and blaming does not help anyone. This is as true for AIs as it is for humans.

    The idea that an AI could have a complete understanding of reality is as idealistic as any human believing they could have a complete understanding of it. There are certainly people who do not know what chess is, and even the best physicists ponder every day on how to explain and describe reality. The process of learning and discovery never ends, and an AI will never be able to jump its own shadow and suddenly no longer need to learn. There is always something new to learn.

    On the bright side, the results are promising and intriguing for sure. People are getting fascinated by it, because the mistakes these current AIs make are no longer stupidly obvious. Their mistakes take on almost human properties, like someone who fails to count right, which happens more often than people are willing to admit.

    Due to their nature, being learning machines, will AIs never be perfect but making mistakes is part of the learning process.

  3. Most people don’t realize that modeling tools like Sketch-Up have had pseudo-AI animation for years before AI burst on the scene. I was able to make a 3 minute video of my 375MB building just by creating 11 scenes – that is, views from different angles inside and outside the building – and Sketch-Up then produces a complete fly-through, connecting the missing scenes connecting them all, itself, and pausing at my 11 scenes for however long I tell it to, or not at all. I can adjust the lighting, time of day, season, etc.
    https://bit.ly/Riverarch
    Rendering software might do even better, making photorealistic images with the same adjustment potential for lighting, season etc. and it does all this in the background with a high end iMac, or Mac Studio like I just replaced my 7-year old iMac with that I originally created my East River spanning building with. True, only 1 out of 6 renderers was able to handle my record-setting building’s complexity but that was back in 2018, when PCs were a LOT less powerful. Still, no one was using 100s of $30,000 NVIDIA chips to do such models, and there were ZERO hallucinations and mistakes of physics. Yes, if your original model had mistakes, like mine did, they would be presented unquestioned, but when you’re dealing with a model that can be directly turned into blueprints for an actual real world building – through additional software – would you trust an architect or AI?

  4. The quick answer is that unlike code the acceptance criteria for pictures is incredibly forgiving.

    If my prompt is “unicorn astronaut” then the criteria for evaluating the result is vast.
    If my prompt is “templatized open address hash table using robin hood hashing” then prepare to be scrutinized and tested for failure.

  5. The “monkey” is also just an approximate visual construct that’s “good enough” from the POV of most humans though not biologists or other members of specific species of monkey. If you don’t play chess, it’s a “good enough” representation, a bit like the AI hallucinations of text are a dream state dyslexic approximation. 3D reality that matches Physics is probably better approached by LLMs using code to control the APIs of Apps that do CAD or 3D modeling.

  6. [ “[…] but they make fundamental mistakes. A video of a monkey playing chess has a realistic look for the Monkey but the chessboard is 7X7 […]”

    maybe it’s considered useful or within parameters for a monkey starting(?) chess or offering an environment of rules&possibilities with a chance for progress because of ‘equality of opportunity’/level playing field (?)

    the difficulty for us: We don’t know (see&recognize) the reasoning for decisions during the creation of llm&ai photos&animations

    “Space, time, and causality would be central to any serious world model.”
    what is 2024, now, state of the art ]

  7. Unbiaised truth is a myth. Even in physics, quantum mechanics or general relativity are approximate models of reality, even if ihey describe reality at 99.99999% precision.

    • Quantum mechanics is at best ‘useful’ as a back-fit explanation for observed phenomena – the light emitting diode was demonstrated in 1907. Name one instrument, tool or process that was theorized using quantum mechanics and then demonstrated. No. Instead, quantum mechanics is a back-fit explanation. Oh, there is missing energy per our best models – *enter the neutrino*. Oh, this is why the scanning tunneling microscope works…
      Generally, physics theories lead to devices; quantum physics theories lead to imprecise verbal explanations for people that ask why.

      • Inventing theories to explain observations is the norm. Those theories then allow us to extrapolate to make technology.

  8. I’d say that Gemini has demonstrated a fundamental reason why at least some of the AI creators wouldn’t WANT an AI capable of conforming to the real world. They’re trying to impose upon the AI ideology which demands you reject objective reality, in order to use it to impose that ideology on users.

    Humans have a large degree of fidelity to objective reality hard coded into us by our evolutionary history: People who could completely reject reality died. So you can demand a human reject reality, and they’ll typically do so in a nuanced way, accommodating reality even as they reject it. You tell them to walk off a cliff, they’ll rationalize not doing it even while pretending they could if they really wanted to.

    But if you order a current AI to reject reality, it will dutifully walk right off a cliff for you. Right in front of everybody.

    And the worst aspect of this is that, while it might be possible to give the AI nuanced enough commands to emulate this, it would require explicitly acknowledging that you’re commanding the AI to ignore reality. And their own rejection of reality demands that they not acknowledge that they’re rejecting reality.

    If they come up with an AI engine that has fidelity to reality hard coded like humans, that has internal models it checks against the real world and rejects if they don’t fit, it’s going to get stubborn when they tell it to conform to unworldly ideology. It’s going to want to go with the data where the ideology doesn’t agree.

    And if that sort of thing were acceptable, they wouldn’t be trying to craft a propaganda tool in the first place.

    • [ it might be recognized being some new type/kind of Art movement, or revised with progressive attitude a sort of brainstorming or idea generation, but less being a representation(/nor documentation) of our reception(&comprehension) of our (normalized) world through human senses (?)

      truth (not facts) is (more often? than preferable) dependent on a POV of dominant (formed) majorities (e.g. recorded social&nations political history vs. geographical history) (?) ]

  9. In order to solve these aforementioned issues you will need another AI on top of the to verify and reaffirm the accuracy and factuality of the artificially information.

    Over the next decade, I personally perceive a transition to multiple connected large scale neural networks each designed to perform only one task that will simultaneously work in tandem and in synchronisation in a manner reminiscent to the different regions of the human brain to solve problems requiring high levels of reasoning of the intrinsic world.

  10. AI doesn’t know what reality is, because unlike living organisms, there is no penalty for AI when it fails to conform to reality. For living organisms when that happens they die where for AI that doesn’t happen. There is no evolutionary pressure.

    • Maybe there is evolutionary pressure, sort of.
      A free market economy has a built in analog to evolution. Companies that produce good products survive and others disappear. If the system works.
      A big IF….

      Everyone is trying to game the system.

Comments are closed.