OpenAI Sora Model Generates Photo Realistic Video from Text and Google Gemini Pro 1.5 Has 1+ Million Context Length

OpenAI just announced their new Text-To-Video model called Sora.

Look at these insane examples:

* Space movie trailer featuring a man wearing a red wool knitted motorcycle helmet
* Fluffy animated alien
* dwarf in a zen garden inside a glass sphere

5 thoughts on “OpenAI Sora Model Generates Photo Realistic Video from Text and Google Gemini Pro 1.5 Has 1+ Million Context Length”

  1. It still has the problem of having many corner cases (e.g. don’t ask it to model a plastic chair), where it doesn’t know enough about some objects and light effects, resulting in items looking floppy, morphing into others or sprouting limbs out of nowhere.

    I’d say it’s already like 80% there, right in the Pareto of bare usefulness vs the weird requests, needing some special additional training to model correctly.

    It will get better, though.

  2. I want to be able to describe a character then have it drawn. Then I want to be able to have that SAME character drawn later… it needs a memory of that character or some way to output enough info that it can recreate that character later. I want to be able to age the character and place then in any environment and have them displayed in any desired style.
    It needs to be REPRODUCIBLE.. every image of that character once drawn should be able to be drawn again with the correct input.

    • There are some ways where you can create a character sheet of a character and then train a LoRA of it, or at least that’s what YouTube has taught me (my desktop died soon after I started playing around with Stable Diffusion). It requires

  3. Wait, are you saying Gemini can take my entire 400,000 2-volume Sci-Fi novel, Neitherworld: https://amazon.com/Neitherworld-Book-Akiiwan-Scott-Baker-ebook/dp/B07NHTTKC3/ref=sr_1_2, (I can combine the 2 PDFs into 1), and turn it into a video/movie? Do the 24 pictures I commissioned an artist to create help or hurt that effort? I’m about to receive a Mac Studio with 64gb ram & 4tb storage. Is that adequate to store the resulting film & process the request?
    I’m not sure what this means for creators. LLMs can produce beautiful, fantastic things, but if they can’t deliver what creators actually want & imagine (more or less), are they really useful?

  4. If it is anywhere like Dall-E, then it is fun for getting A result out of it. If you want something specific, it can’t do it. “Make the monster green” will change not just the color of the monster, but everything else too.

Comments are closed.