OpenAI Hyperrealistic AI Videos and AI Video Generation for World Simulators

OpenAI Sora is a revolutionary text-to-video AI model that is a significant advance for artificial intelligence and real-world applications. It is capable of generating up to one-minute-long videos from textual prompts, maintaining exceptional visual quality. Sora utilizes a diffusion model to evolve videos from static noise into coherent visual narratives, setting a new standard in AI technology.

OpenAI also revealed research for video generation models as world simulators. They explore large-scale training of generative models on video data. Specifically, they train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. They leverage a transformer architecture that operates on spacetime patches of video and image latent codes. The largest model, Sora, is capable of generating a minute of high fidelity video. The results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.

Hyperrealistic Video can be used to generate hyper-useful AI training data. This goes in line with the scaling of training compute by 100 times every year. By 2025, this OpenAI video generation could scale to many hours. By 2026, weeks of video could be generated every hour. The generation of training data could become many multiple of real-time.

20 thoughts on “OpenAI Hyperrealistic AI Videos and AI Video Generation for World Simulators”

  1. Custom movies on demand.
    Why not have it ‘read’ a book and generate a movie from it?
    There are many Syfy books that have never had a movie made form them.
    I would like to see the book ‘Armor’ by John Steakley as a movie.

  2. Some years back Google made streaming 3D from Google street view which was impressive.
    If you could get SORA to work with street view as a starting point you would have a fully immersive 3D of the real world.

    Great for virtual teleporting & travel.

  3. Why haven’t we seen these models trained on audio dialogue? ChatGPT text to voice is clumsy and not very expressive. If trained on audio dialogue, telling the difference between human and computer would probably be impossible.

  4. I think a fun way to test its current limitations would be to prompt it with a description of a martial arts fight scene and give detailed descriptions of custom choreography. If it’s even 60 percent accurate, you’d know there was some technological sorcery afoot (by which I mean to compliment the Sora team), because that’d be incredible.

  5. How many times has it been said (by me, among many others), what you see, hear and BELIEVE, ask questions. Questions cost you or others nothing. Conclusions can cost you and others so much more. So stop “w/out question” believing what you get from cable news or websites that only re-enforce your ego. Perhaps AI today can present “unimpeachable” evidence based on what is observed. Thought, gang…

    Effective propaganda works by not convincing you of what they want you to believe. But re-enforcing what you want (or need, for your own ego) to believe. Never stop asking questions, even more so when you love all the “answers”.

    • One of the greatest risks, possibly thé greatest risk, of AI, as I have been told by some senior engineers/scientists, who work or worked in that field, is that it can and will eventually create fake data, fake information, fake knowledge and fake reality, that with be virtually indistinguishable from the real thing.

  6. So, is this leading to a system that can, in a mostly automated fashion, turn novels into feature length motion pictures? Because that’s a real killer application.

    Given how stultified the major studios have gotten, and the stratospheric costs of producing motion pictures through them, a system that allowed an author to turn his book into a movie, and market the movie through a streaming service or disks would be a license to print money.

    • This was my thought too. I want to pump the first of my 400,000 word 2-volume novel, Neitherworld (on Amazon), with minimal cleanup of page numbers, formatting, and maybe pictures – unless the AI can use those too – into the new 750,000 word AI and ask it to make a movie of the story. My guess is it’s going to fail very badly, if it even attempts it at all.
      While AI can do beautiful and fantastic things, the question is can it do what creators actually want? Aside from some low-hanging fruit like short pieces of code or standard legal contracts, it’s not clear. AI may not be as useful as it seems then. It may be like FSD, which gets ever closer to good, even better than human, but never perfect, and also unpredictable, failing at things people find easy to do.

      • Such a system could require a lot of human input to get it right, and still put major motion pictures in the range of kickstarter funding.

    • As someone who writes a lot of horror and some sci-fi, that’s where my mind went when I saw this technology hit. Very good points. Then again, I’d also want to ensure that the video material is ethically sourced. I could see copyright issues of it gets out of control too quickly, as has happened with some A.I. stuff already.

    • Remote viewing is very interesting to me. Decades ago, I took part in “tests” that involved this. I never was told how I “did” on those tests. But over the years, I did hear a few things. One, I was TERRIBLE at finding out (seeing&knowing) what was targeted. What I will say, I have never mentioned publicly in open press before. Or had the clearance to do so. I do now. The target I was given was a very ordinary office building outside Falls Church, VA. My target was a very specific office in that building.

      There were three large identical office buildings in this complex. I was not told this. I was told “just find find a single office in the building and see distinctive things” (I was not told there were 3buildings that looked the same) It actually didn’t matter… I saw nothing, that was there at that time. 2of the 7 guys saw a naked lady on an office desk. (There WAS a naked lady on an office desk, among other office “stuff” around her) Since I thought I saw nothing I debriefed the guys who “saw” this.

      I asked the first guy; ‘what else did you see in the room’? He said: What room” Oh, I’m not making this up… The 2nd guy said “I see a Rolladex, but its not on the desk w/the lady. I said Oh? Why not? “There’s no room” (meaning space on the desk) Then where is it? “On a table next to the desk”. ” (I asked him) “But you saw the phone on the desk”?. “NO, there was no room for anything but that lady!” Everything that man said about “the lady” and that room were unimaginably correct.

      The first guy saw only the lady. The second saw the lady and details of the room I wouldn’t have noticed if my life dependent on it. As a trained scientist and former spook, I’m ashamed to admit this. There’s much more to this RV stuff, but will leave that for later.

    • I wrote a BASH script that, before I was an idiot and accidentally wiped that disk, got about 70/30 accuracy for RV sessions, and that was just me messing around. The script had multiple parts: part of it was the monitor, part was the viewer and part was the “person” who would judge the accuracy of results. And then I’d view the final result and say, “Wow, yep,” or, “HAHA NOPE!”

      It was based on very large word lists and, of course, was limited because it couldn’t draw anything. But anything that could be described, like colors, shapes, sounds, smells, from the word list, that was doable.

      I came away wondering: if data is everywhere, do we really need a human to do this stuff or can it be done by any object that can wirelessly intake data and provide a result?

      • You may be right. So much of RV may be is random acts of chance, until they get down to nitty gritty details. Such as how can certain “observations” be associated with time variables that are both independent, and not intentionally connected? My last post about RV said there was more to come and here it is; I said I was lousy on finding the location of a specific room in an office building I was targeted to “hit”. And I was.

        But what I DID experience was this very thin man moving around this dreadful shack of a farm house covered in this god awful pink paint. His “dread” I felt as so real. What I saw (better yet felt emotionally) seemed so real. I had no clue what was going on. (something RV’s also “feel” when they see things). But I was convinced I WAS THERE when I was feeling it. Oh God, that feeling was awful. I pray I never feel that desperate hopelessness again.

        It was years later, I learned that office park, was once a farm own’d by a man, who himself, and whole family died of TB in 1889. So what am I to make of that? I hope it’s just my vivid imagination. I hope to God what I felt didn’t actually happen to anyone.

  7. If the problem of physics, light and real objects simulation can be reduced to linear math in arrays of enough size and speed, we will eventually have a future when such things are cheap and portable.

    Every super-computer of yesterday became the computer of today and the cheap cellphone of tomorrow.

    Thus VR goggles that synthesize fully realistic worlds on the fly aren’t impossible, even if it takes us some years or decades to get there. We are basically teaching computers to dream realities.

    This result is honestly, one of the weirdest outcomes of the ongoing neural net revolution. But it is fitting: given our own sense of the world comes from wet neural networks, trained from the world’s information via our senses, and they are also capable of producing tangible realities. Namely, our dreams.

Comments are closed.