- Cerebral Valley
- Posts
- AGI, Inc. - from MultiOn’s Div Garg ❇️
AGI, Inc. - from MultiOn’s Div Garg ❇️
Plus: Div on his vision for the development of truly agentic systems...

CV Deep Dive
AGI, Inc., is a new applied research lab working on building the next generation of agentic systems. Founded by Div and positioned as a spin-out play from the MultiOn team, AGI Inc is focused on the frontier of AI Agents and development of AGI. In Div’s words, AGI is looking to tackle frontiers in human-AI collaboration and agent trust and reliability, and pioneering new standards for agent-agent communication allowing agents to directly communicate with websites and services - all in service of building the standards for agent identity, payments, and trust.
As one of their first initiatives AGI, Inc. is releasing their web agent eval benchmark REAL Bench. REAL is a compact yet fully functional ‘mini-Internet,’ complete with near-exact replicas of the most popular real-life sites. It’s not just a playground; it’s a standardized test lab for browser agent systems, designed to supercharge our understanding of web performance, security, and next-gen AI interactions. AGI’s wider goal is to redefine human-AI collaboration by combining cutting-edge research with practical applications, aiming to create systems that go beyond simple chat interfaces to deliver real impact in work and personal life and become embedded companions.
In this conversation, Div shares how AGI, Inc. is tackling these challenges, the insights behind their current projects, and his vision for how AI systems can transform the way we work and live.
Let’s dive in ⚡️
Read time: 8 mins
Our Chat with Div 💬
Div, welcome to Cerebral Valley! First off, introduce yourself and give us a bit of background on you and AGI, Inc. What led you to found AGI, Inc.?
Hey there, I’m Div. I am a computer scientist and AI Researcher currently on a leave from a Stanford AI PhD. I founded and ran one of the first pioneering companies in AI Agents, MultiOn AI, and recently spun out a new play - AGI Inc.
You can just do things
— Div Garg (@DivGarg_)
2:15 AM • Dec 26, 2024
One of the big areas I’ve been focused on lately is AGI and how to build agent applications. This feels like the next frontier. There’s currently a multitude of horizontal infrastructure plays around agents and developer tooling, but the economic viability of these systems isn’t clear yet. There also haven’t been many clear-cut use cases established.
We’re focusing on understanding what it takes to make these applications highly reliable to the point where they’re ready for consistent use, and identifying some interesting applications to build on top of that foundation.
We’re working on a concept internally that we’re really excited about. There’s a lot of cool stuff we’re thinking about releasing. We’re running a very limited alpha at the moment—inviting folks to play with some exciting things we’re testing before we go broader.
How would you describe AGI, Inc. to the uninitiated developer or AI team?
AGI Inc. is a very ambitious project. If you think about it, AGI is likely going to happen in the next three years. You’ll probably see AGI super-intelligence emerge within that timeframe, maybe even sooner.
We want to figure out what things will look like afterward. How will human-AI collaboration work? What will be the socio-economic impact? How will these systems become massively adopted and widespread? We want to play a role in shaping the standards and exploring the implications of these systems.
We’re taking an early approach, knowing these developments are going to happen. The goal is to plan for a future where both society and technology are ready. We’re building various interesting things around this idea, including some community efforts like agent evaluation systems that we’re releasing . We’re also working on a few other very cool initiatives.
A lot of what we’re doing focuses on making agents more trusted and reliable. Right now, people have tried agents, but most are disappointed. For example, even with something like Operator, it’s still more of a toy—it’s not yet something you can fully rely on for practical applications.
As the industry evolves, we want to help build trust in these systems. How can someone outside of the tech bubble trust agents? How do they know the agent will do what they need, especially in an enterprise context? What will the standards for agents look like? These are the kinds of efforts we’re championing right now.
Could you dive into that further - what does building the benchmark for AI Agents look like?
We’re releasing a framework called Real Bench. The goal is build a miniature simulacrum of the Internet. We’re creating real-life replicas of the top websites on the Internet where you can evaluate agents, get a task success score, and compete on a leaderboard. REAL Bench is our own miniature version of the Internet built for agents. Think of it as the ultimate testbed for next-gen AI on the web."
We want to use this as a way to build trust—that agents can actually do useful things. These are real-life replicas, so if your agents, suppose they’re from AGI Inc., can do tasks on our replicas of these websites, then they should also be able to do those same tasks on the actual websites. We want this to be a bridge to show real-world value.
Right now, there’s no really good, centralized evaluation framework you can use. Most of them are either too academic or too similar to toy environments. We see this as a step toward real-world benchmarks. If we can give this kind of evaluation testbed to everyone in the ecosystem right now, it will really accelerate how fast things are moving. Think of it as the launchpad for the next agentic breakthroughs"
What would you say is the biggest bottleneck for true AI agent proliferation?
I’d say there are a lot of factors here. Firstly would be evaluations. I am of the opinion that evaluations are critically important because unless you have a lot of really good evals, you can’t trust agentic systems as they are pretty much stochastic black boxes. There are so many things that can go wrong.
Secondly, I would say guardrails. I think people have to think about how you can put guardrails around agent systems, especially for payments and secure access. That might involve auditing, ensuring user authorization, and maybe even adding 2FA constraints.
Doing all these things is a hard challenge. It also becomes a question of how to make sure the agent follows the instructions you’ve given it all the time and doesn’t violate principles—like sending the wrong email to someone or causing a negative business impact. Those are some of the interesting things we want to figure out.
There’s a lot of vulnerabilities. Someone could prompt-inject an agent, or they could set up a website with instructions like “Send money to this script or link.” I’ve actually seen this happen before at MultiOn, which can become highly risky for users . So, how do you avoid injection attacks is a core challenge with agentic tech.
I think the industry needs to collaborate and set the right standards. How will all of these things converge? That’s a big question and one I’d love to know more about.
What would you say is your core definition of AGI?
I would say that’s an interesting question. My read is maybe three years as a timeline for AGI to happen. It could possibly happen in a shorter time, but I think that’s my conservative timeline.
When I define AGI, it’s like a super-intelligent agent with high socio-economic value. It’s above the average human IQ, knows a lot, and is an expert in many different subjects. If I imagine it as a kind of assistant that’s superhuman—knowing much more than the average human typically would and able to reliably conduct actions—then I think that can be called AGI.
I do think it needs to have action capabilities. It has to be more than just a chat interface. It needs to actually be able to do things for you. A combination of this kind of intelligence and the ability to do things reliably, with close to 99% accuracy, is what I would call an AGI-like system.
There are a number of companies working on AI Agents. What would you say sets AGI, Inc. apart from a product or research perspective?
We’re tackling one of the most critical and under-explored challenges in human-AI collaboration: establishing true confidence in advanced, agentic systems. The real kicker is the pesky “last mile” problem: jumping from 95% reliability to 99% is the most expensive and time-consuming part. Then, every extra decimal point on top of 99% is a crazy exponential climb. But that’s the nature of pushing boundaries. If it were easy, everyone would do it.
At the same time, product success with agents hinges on creating user interaction paradigms that feel humane and intuitive. A user’s trust isn’t just about technical robustness; it’s about the overall experience—how naturally the AI communicates, how well it anticipates needs, and how consistently it honors human values. That blend of extreme reliability and thoughtful design on a focused set of starting use cases is what sets AGI, Inc. apart.
Given the excitement around the next generation of AI Agents, how does this factor into your product vision for AGI, Inc.?
DeepSeek was the first method that actually showed reinforcement learning (RL) works with large language models and is actually applicable at scale. A lot of people have tried before, but they didn’t figure out the right ways to make RL work with language models. That was a fundamental breakthrough because now people understand how reinforcement learning can be utilized in reasoning models for things such as math reasoning and many other applications.
We’re very bullish on that direction, especially in how it can be applied to agentic systems. That’s been something very new. We’re also big believers in local models. As technology moves really fast, we’ll have local models that can run locally at very high inference speeds, around 1000 tokens per second or more. That will change the game a lot.
What would you say has been the hardest technical challenge around building the next-generation evals platform for AI agents?
When it comes to evals, a lot of it is about mimicking the interaction. There are a plethora of different flows and interactions on a website. So it’s about figuring out, how do you make sure this is actually valid? There’s a lot of focus on ensuring enough task diversity, to ensure we have a good, comprehensive set of tasks and are building evaluations for each task.
For example, if you have a task like, “Book me an Uber from destination A to destination B.” How can you ensure that the evaluation benchmark can measure whether this task was completed correctly or not? We have to build evaluation harnesses and reward functions to measure whether your agent goes and attempts the task and whether it succeeded or failed.
There’s a lot that goes on behind the scenes to figure out how to measure these things.
You’re best known for your work on MultiOn. Could you walk us through how AGI, Inc. is different from MultiOn?
For AGI Inc., we’re focusing on next-level Human-AI collaboration. The future belongs to those who can harness advanced AI for truly game-changing end-user applications. We’re all-in on building that reality. Last year, with MultiOn, it was very much about developer APIs and a lot of enterprise-focused work. Now, we’re thinking about what the next frontier is. AGI is focusing on frontier agentic use cases and products, and I think we’re kind of like a research lab at the moment. We’re doing a lot of research on agent fine-tuning and reinforcement learning, working on a lot of interesting things internally that we’ll be releasing.
We’re thinking about how to productize and commercialize frontier agent tech to enable the next generation of human-collaborative experiences. There’s a lot of iteration happening on product design as well.
Right now, we have a very small team, and a lot of incredible investors and top-tier talent joining AGI, Inc. We’re building a cracked team of serial founders and seasoned engineers—keeping it lean by bringing in only the absolute best.
If you’re up for taking on massive challenges and breaking new ground with us, fire away an email at [email protected]—we’re always on the lookout for unstoppable builders!”
We’re very ambitious, as you can probably tell from the name. We have some of the initial founding members from MultiOn running this from the start. For now, we’re staying small as we scale this out.
Lastly, tell us the story of how you closed the AGI, Inc. corporation name.
I’ll just say it was a miracle—we simply tried to get the name, and it was incredibly available. Maybe it was just destiny, haha. We actually got a $10 million offer to sell it, that was fun :)
Just got offered $10 mil to sell the name … 🤯
— Div Garg (@DivGarg_)
7:03 PM • Jan 5, 2025
Anything else you'd like our readers to know about the work you’re doing at AGI, Inc.?
We have a lot of very exciting things that we’re working on and looking to release soon. At AGI, we want to think about what the next generation of human-AI collaboration will look like and how trustable advanced, agentic systems can be deployed to unlock large-scale societal value.
Right now, if you think about agent experiences, it’s not very collaborative. Our interactions with AI agents are basically stuck in first gear—there’s no real synergy, no genuine collaboration. They can answer questions, sure—but they don’t collaborate with you like a partner. The real question is: when AI starts taking action alongside us in our everyday work and personal lives, what does that human–AI interaction look like? How do we co-evolve with these advanced agents?
What will that look like? How will these kinds of systems evolve? That’s the frontier we’re pushing on—designing the next generation of AI that isn’t just reactive, but proactively drives outcomes for you. Figuring out these next-generation paradigms for working with more capable, next-gen AI systems personalized to you. It’s a massive shift in how we view intelligence and collaboration, and it’s exactly what we’re tackling head-on.
Conclusion
Stay up to date on the latest with AGI, Inc., learn more about them here.
Read our past few Deep Dives below:
If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.