LevroAI | Continuous Learning

How AGI might come to be…

The conversation around artificial general intelligence (AGI) often swings between grand predictions and deep skepticism. For example, Dwarkesh Patel recently wrote a blogpost on why he does not think AGI is right around the corner.

Specifically, Patel emphasizes that “models are stuck with abilities you get out of the box” and that “LLMs don’t get better over time the way a human would. The lack of continual learning is a huge huge problem.” Patel’s argument is that "The reason humans are so useful is not mainly their raw intelligence. It’s their ability to build up context, interrogate their own failures, and pick up small improvements and efficiencies as they practice a task."

However, this perception is rapidly being challenged by new research on techniques that teach models to be better! Reinforcement learning, particularly using methods like GRPO, is at the core of teaching models to be better. In fact, it's increasingly understood that RL can be qualitatively similar to how humans learn, by noticing successful or unsuccessful outcomes and efficiently updating.

The technical limitations of reinforcement learning in the past has been that it is fairly hard to set up the reward function and actually go through the mechanics of training. However, new techniques are making this easier and better.

Recently, a team at the UC Berkeley released a new technique called INTUITOR which improves a model purely reflecting on its own thinking (i.e., not unlike a human). The model optimizes "intrinsic signals” such as their own confidence in the answers to help improve results. This technique enables scalable and domain-agnostic improvements of LLMs much more easily than traditional RL techniques that require human easily than traditional RL techniques that require human feedback or verifiable supervision. You simply run a large set of prompts through the model, and the model gets better at answering those questions over time.

We tested this technique at Levro AI (where we have built a platform that trains models in tool calling). Taking a user’s inputs of their API schema and a set of queries they’d wish to be answered, we got the platform to take care of training and generate custom small models that outperform SOTA models in tool-calling.

The implications of these advancements are profound. As Dwarkesh Patel himself explains, "An AI that is capable of online learning might functionally become a superintelligence quite rapidly without any further algorithmic progress". This is because unlike humans, who learn individually, AI models capable of online learning could potentially "amalgamate their learnings across all their copies," meaning "one AI is basically learning how to do every single job in the world".

Given the pace at which research into better RL techniques have progressed, it may not be so far-fetched to say that AGI may be right around the corner.

Why I think AGI IS right around the corner