- Cerebral Valley
- Posts
- OpenAI’s Codex Lets One Engineer Ship Like a Team of Ten ⚡
OpenAI’s Codex Lets One Engineer Ship Like a Team of Ten ⚡
Plus: Thibault Sottiaux on building OpenAI’s Codex, deploying intelligence across the full software lifecycle, and the future of OpenAI...

CV Deep Dive
Today, we’re talking with Thibault Sottiaux, Engineering Lead of Codex at OpenAI.
Codex is OpenAI’s agentic coding system designed to function like a highly capable software teammate. Built on top of OpenAI’s latest frontier models and deeply integrated across developer workflows, Codex helps with everything from scoping and planning to implementation, code review, verification, and large-scale refactoring. Its goal is to dramatically increase the amount of intelligence deployed per developer and compress the entire software development lifecycle from idea to production.
Today, Codex is used by nearly all software engineers at OpenAI, reviews essentially all production code, and has enabled small teams to build and ship systems at speeds that were previously not possible. It powers multiple surfaces, including IDE extensions, CLI tools, and cloud-based agents, and is increasingly embedded into automated systems like CI and GitHub Actions.
In this conversation, Thibault shares his journey from DeepMind and Google to OpenAI, how Codex evolved from early prototypes into a core internal platform, what it takes to build reliable agentic systems, and why the future of software development will be defined by small teams leveraging enormous amounts of intelligence.
Let’s dive in ⚡️
Read time: 8 mins
Our Chat with Thibault 💬
Thibault, welcome to Cerebral Valley! Could you start by telling us about your journey to OpenAI and what led you to lead the Codex team?
The journey started, I’d say, a long time ago. When I was a kid, I dreamed about being able to talk to my computer. I was seven and really wanted to play, but I was doing more than just playing chess. Chess engines were already a thing, and the first program I built was something that would emulate talking to me. Over time, I got more and more interested in AI.
I studied AI at university eventually, although I didn’t attend class that much. I eventually launched a startup that ran supply chain management, and then got into Google, which had nothing to do with AI. At the same time, DeepMind was in London and had this huge presence there, so I eventually joined DeepMind. During its early foundational years, there were only around 400 of us.
It felt like a very separate thing with a common mission. Those were the golden days of DeepMind, with grand challenge after grand challenge, AlphaStar, AlphaGo. It was super inspiring. Then, on the side, OpenAI decided to take large language models and push them to the extreme, really believing in the scaling laws. That disrupted the little bubble DeepMind was in at the time, which was focused on games and grand challenges. It completely undid, I’d say, a lot of what DeepMind was building at the time.
Then came Gemini. I led the Gemini human data team there for a bit, as it grew to hundreds of people in the runup to shipping the first Gemini models.
At some point, I was like, hey, I want to join something that feels closer to my values, and where the talent density is higher. When I talked to people at OpenAI, they genuinely believed in the mission, which is to benefit all of humanity with this progress. So I decided to join. This was just before reasoning models, before the o1 model.
In typical OpenAI fashion, when you join, you’re immediately part of a sprint. It is hectic and chaotic. So much is happening. That was a magical moment, and I stuck with it. I was building tools primarily to help researchers move faster. Around late last year, I became obsessed with the idea that the bottleneck was not the models.
The models were going to continue to improve. What I was seeing at OpenAI was pretty magical, and it was very clear that we had not hit any slowdown. The true bottleneck was how we deploy this safely into the world in a new way. I became obsessed with building infrastructure and products, and started working on a few prototypes that were pre-Codex.
Over the course of this year, this really started to materialize. We decided to ship a first version, which was Codex Web. It was a very forward-looking version of Codex, fully in the cloud and uninterruptible, going from prompt to pull request in one go. We then pivoted a bit and worked on other things in the meantime. That was my journey to getting here.
Now I lead the engineering team for Codex. We are about 30 engineers, and we run it as a single unit. We have research and product engineers all working together. I find that to be an effective way of doing things, especially since it is still very early.
How would you describe Codex to a developer who's new to it? What’s the core problem you're solving?
Codex is there to help with whatever issue you have, whether it is a new idea or an existing project you want to work on. Codex is like a teammate that’s there to help you. It helps with scoping things out, planning, coming up with good ideas, implementing, reviewing your code, and acting as a soundboard. It is an extremely flexible and malleable teammate that just happens to be a virtual one.
Codex began as an internal tool. Can you share the story of its adoption inside OpenAI, from the first users to 95% of your engineers today?
The very first version actually consisted of two separate projects. One was in the terminal, and it was not truly agentic. It was based on O3 at the time. The way it worked was that we applied some heuristics and context and stuffed the model so that it could zero-shot problems for you. That was quite popular. Immediately after its release, it had around 50 users, and that group grew over time.
There was also another version being built, a prototype that was an agent which would collect its own context and work in its own virtual sandbox. This also grew to hundreds of users, and this was just early this year. Then we decided to combine these approaches and build one team and one agent.
From there, adoption continued to grow over the year and has now reached the point where around 95 percent of software engineers at OpenAI use it, and it reviews 100 percent of our code. Many teams actually disallow disabling it, because it catches so many flaws and critical issues in people’s code that it’s considered essential.
Humans get quite distracted. One thing we have seen is that it is not just that more people are using it, but that people are using it more. As a result, we have been able to deploy a lot more intelligence per user, especially in the last few months. That is what we have seen with Sora, and we’ve seen it with Atlas as well.
Entire apps, entire codebases have been spun up from scratch where the majority, or even 100%, of the code was written by Codex. It was a shockingly small team driving that progress and development. What we have seen is a shift from big teams to small teams that are able to leverage more and more intelligence.
What are the most impactful ways your own teams use Codex? Any favorite stories?
I’d say my favorite example is that the IDE extension was built by one person who was using Codex to the max to plan, write code, and find bugs by running lots of background instances. The code was getting written, bugs were being found, and fixes were being applied, all at tremendous speed. At some point, I was looking through the PRs and there were more than 30 PRs per day being generated and merged. It was a surprisingly bug-free and super successful release. The IDE extension has been our most successful surface, so that’s probably my favorite example.
Another favorite moment is that every time we ship something, we have Codex find critical issues in the days leading up to a launch. These are issues where, if we had found them after launch, it would have been quite a disaster. It’s allowing us to move much faster.
The Codex CLI seems to be a key entry point for developers. For a startup team adopting Codex, what’s the onboarding experience like? What features should they try first?
We recommend trying the IDE extension, like the VS Code extension. It works with all VS Code forks, or the CLI. It’s the lowest friction option. You just install it. As a first query, we recommend asking a very simple question to understand a codebase, get some thoughts on a design, or implement a very simple feature.
After that, I recommend ramping up the complexity of the queries and prompts you give Codex. You can go surprisingly far. Codex can work for hours to scope things out, find relevant code, or make very large-scale refactors for you.
Once you get comfortable with the limits, if you want to push further, there’s a lot of documentation you can dive into to further customize things. There is a fairly complex configuration language that lets you go deeper and really make it your own. Experts use the SDK to control Codex in a fully programmatic way and hook it into all sorts of systems, including GitHub Actions. For us, it is doing extra things on demand and continuously in an automated way. We deploy intelligence to handle all sorts of trivial tasks for us, but that’s definitely on the more advanced side.
How do you measure Codex’s impact, and what metrics are you focused on?
Right now, we are focused on the amount of intelligence deployed in service of users. We obviously also measure how many users we have and how engaged they are, but fundamentally what we are pushing on is having an edge when it comes to the agent. We are primarily focused on building the best model and having the SOTA agent in the world.
To do that, we track a whole series and variety of evals. We're constantly asking how much we’re pushing the frontier of agent decoding.
The story of Codex is a solid case of dog-fooding a product. What are the advantages and challenges of this internal-first development approach?
The advantage is that once you figure it out, you can move at speeds that were previously not possible. The challenge is that you have to figure it out. It requires trying various techniques, different ways to engage with Codex, and different ways to leverage Codex. It also requires configuring a lot of the pieces.
You should not only lean into Codex for code generation, but also for code verification and code review. If you only accelerate code generation, you end up introducing a bottleneck somewhere else. You really need to think about the entire process of developing software from zero to one.
That means understanding how you go from a feature that does not exist to a feature that is stable in production, and how you can leverage intelligence at every stage of that development.
Your team has also been open about challenges like degraded speeds. Can you talk about the architectural decisions that allow Codex to scale and the hurdles you've faced meeting demand?
We published a fairly thorough public retro. When we had reports of degradations, it was not just degraded speeds; it was also degraded performance. As much as it’s delightful to hear that people think it is a simple system, with a box you type into to receive some code back, there is a lot of complexity under the hood. There’s a significant layer between the GPU and the system actually being able to act safely on your computer.
One of the challenges is that fundamentally we are dealing with a non-deterministic system. This introduces challenges when it comes to pinpointing the exact root cause of a potential issue. First, you need to confirm that it is actually an issue. If I am not seeing it and you are seeing it, that might be because of non-determinism, or it might be because something is slightly different somewhere along the path between you and the computer.
There are also challenges related to differences in hardware, caching layers, and routing. Different rules apply depending on which region you are in. Your prompts and workloads might also be very different from mine. There are differences at every part of this stack.
This requires us to really dig into the details. We came up with over 30 possible explanations, about 30 different things we needed to investigate as potential root causes as part of that analysis. A lot of this is published online in a piece called Ghost in the Codex Machine. I recommend people read it. I think it is a fun read, but it is also interesting for anyone who is curious about why this is so complex.
It does a good job of helping people understand that it is not as simple as typing something in and getting a response back.
What has been the hardest technical challenge in evolving Codex from a code completion tool into a more agentic system?
There are two main challenges. The first is pushing the frontier on the model and the agent itself. This requires co-designing model capabilities and training together with the tools. We are finding that when we do this, we reach better and higher performance. It definitely keeps us awake. We’re pouring tremendous amounts of energy into it.
The second challenge is figuring out what the right interface is for this ever-expanding system that can produce increasing economic value over time. The question is what the right interface is between this complex, non-deterministic system and humans. I’d say we haven’t cracked that one yet.
It still feels like the product needs to evolve. This is a highly creative period where we have to try a lot of different things and keep thinking about the right way to interface with what currently might be a single agent, but could evolve into societies of agents and much more complex systems. Those are the two main challenges: continuing to push the frontier, and figuring out the right interface and the pattern of collaboration.
With projects like the AGENTS.md and models like GPT-5.1-Codex-Max, you're improving rapidly. How do you see Codex and OpenAI as a whole evolving in the next 6-12 months, and what should developers be most excited about?
We expect that we’ll continue to launch, potentially at the same or even an accelerated frequency. One challenge is making sure we do not drive fatigue. We are thinking a lot about how we can keep our velocity while also making Codex more meaningful for users.
One of the ways we think OpenAI will continue to change is that we will see more examples like Atlas and Sora, where very small, highly talented teams of three or four engineers move mountains. A lot more will happen automatically and in the background. Right now, much of the work starts with a user having an idea and deciding to prompt Codex to get something done. Next year, we expect Codex to proactively come up with things to do that are helpful enough that you actually want to pay attention to them.
What we are always optimizing for is return on investment, where the investment is human attention. We constantly ask what you get out of the attention you spend.
Another thing that is going to happen is that agents will become so reliable that they will be able to run for days, if not weeks. That will very likely completely change how you interact with your agent. We’ll also see much more compute deployed per employee, probably an order of magnitude more.
Based on your experience scaling Codex internally, what advice would you give to startups looking to build an AI-native engineering team?
A lot of it comes down to good engineering. What we have learned is that it is very important to start from a solid codebase that is properly architected with the right tooling. It’s key to invest in tasks that run efficiently, have good coverage, and are semantically correct, not just change detectors. And it’s worth investing in the ability to sandbox and run the majority of your stack in a virtual machine.
Those are the three areas I would prioritize. Then experiment, try different things, and try to automate all parts of the software lifecycle, from ideation all the way to deployment in production. Even after it is deployed, maintaining the software matters. Do not just focus on code generation.
If you only focus on code generation, you aren’t going to be moving fast enough, I’d say, in 2026.
Tell us about the team and culture you’re building. What do you look for in new members joining the Codex team?
What I primarily look for is a desire to do something that has never been done before, a level of ambition, and a willingness to dream. Traditionally, people might see that as counterproductive in teams, but I look for the most AGI-pilled individuals in the world.
We’re going to revolutionize how software is built. Revolutionizing how software is built also means being part of something that will generate incredible economic value and real benefits for humanity. I’m looking for people who are driven by that kind of impact, who are deeply value-aligned, and who are not just there for the craft alone.
That being said, to work on the Codex team, we are also looking for top-tier talent. Especially now, everyone is augmented by Codex itself. When you give Codex to the right individual, they can do so much more, and we can accelerate even further.
Anything else you'd like our readers to know about the work you’re doing with Codex at OpenAI?
We’re excited to continue pushing the frontier on models and to keep up the rapid release of new models that work very well in Codex. One thing I would recommend sharing is that as capabilities evolve very rapidly, it is important to keep an open mind about what is possible today that may not have been possible just a few months earlier.
That also means that if you have not had success with something before, it’s worth trying again. Every time we ship a major release, whether it is a new model, a harness improvement, or a new product, we believe it significantly changes what you can achieve as an individual, even over just the next few months.
So keep an open mind and try things again. I also recommend keeping it simple. Just talk to the system.
Conclusion
Stay up to date on the latest with OpenAI, follow them here.
Read our past few Deep Dives below:
If you would like us to ‘Deep Dive’ a founder, team or product launch, DM our chatbot here.