OpenAI used up to $10,000 worth of compute for each AGI answer. At a rate of around $1.45 to $1.49 per hour, $10,000 would cover approximately 6,711 to 6,897 GPU hours in Nvidia H100s. This means using a cluster of 900 Nvidia H100s for 8 hours to compute an answer.
Sequoia Capital describes the test time reasoning and training used to get better results.
o1 is showing the ability to backtrack when it gets stuck as an emergent property of scaling inference time. It is also showing the ability to think about problems the way a human would (e.g. visualize the points on a sphere to solve a geometry problem) and to think about problems in new ways (e.g. solving problems in programming competitions in a way that humans would not).
There are new ideas to push forward inference-time compute (e.g. new ways of calculating the reward function, new ways of closing the generator/verifier gap) that research teams are working on as they try to improve the model’s reasoning capabilities.
Why OpenAI’s o3 Isn’t AGI
OpenAI’s new reasoning model, o3, is impressive on benchmarks but still far from AGI.
What is AGI?
AGI (Artificial General Intelligence) refers to a system capable of human-level understanding across tasks. It should:
– Play chess like a human.… pic.twitter.com/yn4cuDTFte— Levon Terteryan (@levon377) December 21, 2024
System 1 vs System 2 Thinking
This leap from pre-trained instinctual responses (”System 1”) to deeper, deliberate reasoning (“System 2”) is the next frontier for AI. It’s not enough for models to simply know things—they need to pause, evaluate and reason through decisions in real time.
Think of pre-training as the System 1 layer. Whether a model is pre-trained on millions of moves in Go (AlphaGo) or petabytes of internet-scale text (LLMs), its job is to mimic patterns—whether that’s human gameplay or language. But mimicry, as powerful as it is, isn’t true reasoning. It can’t properly think its way through complex novel situations, especially those out of sample.
This is where System 2 thinking comes in, and it’s the focus of the latest wave of AI research. When a model “stops to think,” it isn’t just generating learned patterns or spitting out predictions based on past data. It’s generating a range of possibilities, considering potential outcomes and making a decision based on reasoning.



Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.
I’d like to see what o3 would score on the EIT(FE) test engineers can take after graduation with a BS. It’s the beginning of the guildlike PE(professional engineer) track. You can’t open an office as an engineer after graduation with any degree, as lawyers, and doctors can. Someone who already has a PE that you’ve been working for has to allow you to open a competing firm.
Lots of engineers pay the dues, but are not allowed to take the PE exam by their employers.
I sort of got off message. I left out their is one for each engineering discipline, and lots of people take it, so there should be a big dataset of testing. It’s open book, very broad, and somewhat deep.
The one critical element to this, which people do not get at the moment, is the nature of the conceptual structures that are learnt within these larger models. The biological architecture and very slow propogation time between neurons within the cerebelum mean that our human conceptual architecture is very small compared to these models. These models are building concepts that by virtue of the biological constraints we have, we can’t comprehend. I do wonder when this will become blatantly apparent.
At Alpha (circa 7Hz) we itterate our thought pattern over a temporal span to achieve an output, to which o3 is approximating in concept.
One step closer to a future we do not yet understand.