Cerebral Valley
Posts
Antithesis - the last word in autonomous software testing 🎛

Antithesis - the last word in autonomous software testing 🎛

Plus: CEO/Founder Will on why he believes AI-driven dev tools will benefit from rock-solid verification...

Ash Rathie
March 21, 2025

CV Deep Dive

Today, we’re talking with Will Wilson, Founder and CEO of Antithesis.

Antithesis offers what some might call a radically new paradigm for testing and verifying complex software. By running your applications in a deterministic hypervisor and using an intelligent search to systematically break your code, Antithesis promises to catch the rare and seemingly impossible bugs that slip through standard integration tests or chaos engineering experiments. Will brings extensive experience from FoundationDB (acquired by Apple and recently disclosed to be underpinning all of DeepSeek’s infrastructure) and Google’s Spanner team, and much of the core Antithesis crew likewise hails from that same background.

Key Takeaways

Preempt Production Incidents with Deterministic Testing: Antithesis simulates all the things that could go wrong with your distributed system in a deterministic environment. This eliminates the reproducibility headaches that plague most large-scale system testing.
Autonomous Search: Rather than you writing a million test cases, Antithesis actively seeks out new and interesting behaviors in your software on its own—then shows you precisely how they happened.
Already Helping Startups & Enterprises Alike: While initial focus is on larger customers, small companies (like Turso DB) have also used Antithesis to secure bold projects like total rewrites.
Potential for AI Synergy: With GenAI producing ever more code—often from less expert devs—demand for deeper, more robust testing only grows. Meanwhile, Antithesis sees opportunities to use AI to generate novel test inputs and fix discovered bugs in a closed loop.
Future Expansion Beyond Distributed Systems: Though best known for finding fault tolerance bugs, Antithesis’s approach can be leveraged to test a wide range of applications—from mobile apps to games to websites.

In this conversation, Will explains how the company’s unique technology was inspired by the “impossible” distributed database days at FoundationDB, how intelligent search differs from random chaos testing, and why he believes AI-driven dev tools will benefit from rock-solid verification.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Will 💬

Hey Will - welcome to Cerebral Valley! Could you give us a bit of intro on yourself and what led you to found Antithesis?

I’m Will—a software engineer by background who never really planned to start a company, but here I am. A lot of what inspired Antithesis came from my time at FoundationDB (acquired by Apple in 2015). We built a distributed database with ACID transactions and high fault tolerance—people said it was impossible because of the CAP theorem, but we did it. The key was this sophisticated autonomous testing system we created. It simulated arbitrary failure conditions, searched the entire “state space” to trigger weird bugs, and it let us deterministically reproduce any situation. That was a total game-changer. After Apple, I worked on Google Spanner and noticed they didn’t have anything comparable. We realized there was a huge opportunity to bring that style of testing to the broader world.

A ton of my colleagues here are from FoundationDB, including my co-founder who used to be my boss there. Come to think of it, my boss at Apple now works for me as our VP of Engineering, and another former boss is an investor. I guess I have good relationships with bosses! But seriously, the vision is to free devs from writing endless test cases by letting an intelligent system break their software in a reproducible environment.

What exactly is Antithesis for the uninitiated developer—like an elevator pitch?

We flip the usual approach to testing on its head. Normally, you write tests for specific cases you think might matter, then hope you covered enough ground. In practice, you deploy to production and discover all kinds of insane edge scenarios you never imagined—routers delaying packets, machines shutting down mid-request, user input with 2^8 characters.

Antithesis starts from the end goal: you specify what your software is supposed to do or not do (e.g., “Don’t crash”), and we systematically explore how to break that rule. We do this by injecting weird environment faults, bizarre inputs, or exotic usage patterns. Because we run everything in a deterministic environment, once we find a bug, you can re-run that exact scenario. No more “works on my machine, breaks on yours.” It’s like combining property-based testing with advanced fault injection, but at scale for big real-world apps.

You mentioned a deterministic state space that you rigorously test through. What does that mean, exactly?

Conventional software is inherently non-deterministic. In real life, a program can spawn threads that run in arbitrary orders, send and receive network traffic with random delays, check the time at different moments, or generate random numbers. That means bugs often appear sporadically, or only 1 in 1,000 times (but Murphy’s Law means it’ll happen at the exact wrong moment in prod).

We built a hypervisor that forces your whole system to be deterministic—thread scheduling, network responses, everything. If a thread is about to run, we decide the scheduling in a repeatable way, so if we see a bug, we can replay the entire scenario. That transforms debugging: no more ephemeral “heisenbugs.” Our approach systematically covers a huge range of possible interleavings and fault conditions.

Is your system primarily geared toward enterprise customers, or could smaller shops or individuals use it too?

Right now, we’re mostly oriented toward enterprise, because that’s where we see the biggest immediate ROI—large companies want to reduce outages and production firefighting. We do have some small startups as well, including pre-seed ones, but the overall product can be quite expensive in its current form. Over time, we plan to make a more polished, self-serve version with a cheaper or free tier. That’s definitely on the roadmap.

What are the key output metrics you look for to determine success, and do you have any success stories that show a drastic change in those metrics?

Number one is reducing production incidents and outages. If you’re shipping fewer bugs and spending less time on firefighting, that’s huge. We had one customer tell us that their “support people were getting bored”, and that made my day.

Another is developer productivity—how much time are teams spending on writing and maintaining tests? Or how much are they wasting triaging and investigating weird, non-reproducible issues? The latter task especially tends to follow on the most senior and valuable members of the team. When we free them up to write features instead, that’s a huge win.

Then there’s the “frontier of what’s possible”: can devs tackle projects they never dared attempt before?Turso DB is a great example. They wanted to rewrite SQLite from scratch, but initially thought it was too risky—like, how do you test something that big and complicated? After working with us, they felt safe to do it, because we systematically hammered on the new code until it was stable. That’s the kind of “frontier” effect we love seeing.

This sounds like a novel approach—who do you see as competition or the “incumbents” in the testing space?

There’s not much direct competition doing exactly what we do. The biggest “competitor” is often a homegrown system at large companies, usually some janky fault-injection approach they built internally. Then you have chaos testing, popularized by Netflix, which is basically introducing random disruptions in production. For smaller or stateless things, there’s fuzzing or property-based testing, but those rarely scale to big distributed apps.

We do see some teams building their own deterministic simulation, but that’s an enormous effort. For most, it’s easier to go with a vendor approach. So in short, there’s no one else systematically combining deterministic simulation, intelligent search, and environment fault injection the way we do.

You mentioned a moat around the insights you gain from multiple customers’ code. Can you share any interesting or surprising insights from that?

Every new customer adds more diversity to our “training corpus.” That means we’re less likely to overfit on just one type of system. We do see broad patterns—like how pure uniform randomness is actually not that useful. If you simply drop 5% of packets or randomly pick functions to call, you might never hit the weird corner where you call function A 100 times in a row without an intervening call to function B.

Structured randomness is more powerful. We’ll do, for instance, periods of complete disconnection, then normal traffic, or sequences that call function A repeatedly in case it contains a memory leak that gets cleaned up by function B. This can reveal deeper bugs that uniform sampling would never touch. Another big theme is that real-world usage is rarely a simple distribution—so we design our search to systematically produce “pockets of chaos” instead of random mild chaos everywhere.

Where do you see software development going in five or ten years if this new paradigm of testing takes off?

Ideally, devs can focus more on high-level logic and less on writing a million test permutations. We want a world where you specify your software’s constraints—like “it shouldn’t crash, it should maintain these invariants, it should respond within X time”—and let an intelligent system handle writing the test cases.

AI dev tools come into play here too. If an AI can generate large amounts of code quickly, there will be more code with potentially more bugs. But also, if we can integrate a strong testing approach behind the scenes, that code can get self-verified. Maybe you ask a language model for new functionality, it writes some code, Antithesis runs a state-space search, finds the broken scenario, then the AI fixes it, and so on. That might let us scale software creation far beyond current limits, because we can systematically ensure correctness.

Speaking of AI, do you see generative coding tools shifting Antithesis’s roadmap, or is Antithesis already using AI?

In the short term, it’s good for us that so much new code is being churned out—some of it by less experienced devs or by LLMs. That means more potential bugs to find. On a deeper level, we see big possibilities in integrating with coding agents: you give them a spec, they produce code, we automatically test it in a closed loop, and only return results once the code passes. That’s definitely an idea we’re exploring.

We’re also experimenting with using AI to generate test harnesses and weird usage scenarios. If it calls an API “incorrectly,” well, your program shouldn’t crash anyway, so that’s beneficial. A higher “temperature” can lead to more interesting test inputs. We’re seeing promising early results from letting AI produce synthetic calling code that humans wouldn’t typically think of.

Aside from AI, how will Antithesis evolve over the next 12 months?

Our biggest push is reducing latency to get results. We used to run in a batch mode, where you’d submit code and a few hours later get a bug report. Now we’re moving toward a streaming model, so as soon as we find a bug, you get notified. This makes the feedback loop much tighter and more dev-friendly.

We’re also broadening the problem domain. Right now, we’re best known for unearthing fault tolerance issues in distributed systems, but the same fundamental approach can test websites, mobile apps, even games. Our architecture is general enough—it just needs a bit more productization. That expansion will open the door to a much larger market.

What can you tell us about the culture at Antithesis, and what roles are you hiring for?

We’re extremely collaborative—everyone says that, but we really mean it. We often spend more time talking through design choices than coding, because once we figure out the correct approach, implementation goes faster. We’re also an in-person company, located in D.C. That’s unusual in the startup world, but it works for us, especially since we do a lot of deep systems work that benefits from real-time interaction.

We hire folks across a broad spectrum—from kernel hackers to front-end/UI people to machine learning researchers. If you’re excited about deterministic simulation, advanced testing, or pushing the boundaries of software reliability, we want to talk. D.C. isn’t the most common tech hub, but there’s plenty of talented engineers here, and we love building a strong presence in the city.

Conclusion

Stay up to date on the latest with Antithesis, learn more about them here.

Read our past few Deep Dives below:

Build specialized RAG agents with Contextual AI 🌐
Remyx - Your AI Production Assistant 💡
Qdrant's GPU-accelerated vector indexing is here 🔋🌐
Athena is your AI-powered remote hire 🎛
Doowii - your AI-first education platform 📖

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.

Antithesis - the last word in autonomous software testing 🎛

Plus: CEO/Founder Will on why he believes AI-driven dev tools will benefit from rock-solid verification...

CV Deep Dive

Our Chat with Will 💬

Conclusion

All Events | Jobs