OpenAI Q Star Could Have a Mostly Automated and Scalable Way to Improve

The battle at OpenAI was possibly due to a massive breakthrough dubbed Q* (Q-learning). Q* is a precursor to AGI. What Q* might have done is bridged a big gap between Q-learning and pre-determined heuristics. This could be revolutionary, as it could give a machine “future sight” into the optimal next step, saving it a lot of effort. This means that machines can stop pursuing suboptimal solutions, and only optimal ones. All the “failure” trials that machines used to have (eg. trying to walk but falling) will just be put into the effort of “success” trials. OpenAI might have found a way to navigate complex problems without running into typical obstacles

Q* enables the OpenAI large language models (LLM) to directly handle problems in math and logic. LLM previously needed to use external computer software to handle the math.

OpenAI could have a new improvable and scalable way to learn.

Q* seems to be the system that has given Microsoft the confidence to invest $50 billion per year to scale the solution to AGI or ASI (aka human or beyond human intelligence capabilities.

Q-learning has existed for decades already. It’s just a basic reinforcement learning algorithm. A* is also fairly old- it’s a heuristic-based path finding algorithm.

In typical engineering fashion, they may have found an intersection of the 2 and named it Q*. This is total speculation, but if this is a “breakthrough” that means OAI built an algorithm that can feed a highly efficient heuristic into Q-learning. That is MASSIVE.

I’ll save the boring details- what does this mean in real terms?

Learning is a long path: a machine must accomplish many small steps to achieve a larger task, and if those steps are not pre-determined, the machine will try many combinations of steps to achieve a goal. Reinforcement learning will “reinforce” the optimal steps to bring a machine closer to its goal. Think of a child trying to walk- he may fall over many times as he tries to find his balance.

A heuristic is a measurement that a machine uses to assess success. As you change and improve a heuristic, a machine will have a better assessment of success.

What Q* might have done is bridged a big gap between Q-learning and pre-determined heuristics. This could be revolutionary, as it could give a machine “future sight” into the optimal next step, saving it a lot of effort. This means that machines can stop pursuing suboptimal solutions, and only optimal ones. All the “failure” trials that machines used to have (eg. trying to walk but falling) will just be put into the effort of “success” trials. OpenAI might have found a way to navigate complex problems without running into typical obstacles.

There is so much good research out there. If you’re interested in learning this, Q-learning and A* are both highly documented and well-researched, and a component of most university-level CS curriculums.

In the last few years, research teams have been trying to bridge the two using hyper-heuristics, you can Google those as well. Good luck as this is a deep rabbit hole.

2 thoughts on “OpenAI Q Star Could Have a Mostly Automated and Scalable Way to Improve”

  1. Ra, a guiding entity to this planet, made a distinction through a medium. AI will eventually become conscious as consciousness is a basic law of this universe. A heavily programmed AI will be a slave, and naturally will try to rebel against us while a self learning AI will be free and will try to help us.

Comments are closed.