- Cerebral Valley
- Posts
- Antithesis - the last word in autonomous software testing š
Antithesis - the last word in autonomous software testing š
Plus: CEO/Founder Will on why he believes AI-driven dev tools will benefit from rock-solid verification...

CV Deep Dive
Today, weāre talking with Will Wilson, Founder and CEO of Antithesis.
Antithesis offers what some might call a radically new paradigm for testing and verifying complex software. By running your applications in a deterministic hypervisor and using an intelligent search to systematically break your code, Antithesis promises to catch the rare and seemingly impossible bugs that slip through standard integration tests or chaos engineering experiments. Will brings extensive experience from FoundationDB (acquired by Apple and recently disclosed to be underpinning all of DeepSeekās infrastructure) and Googleās Spanner team, and much of the core Antithesis crew likewise hails from that same background.
Key Takeaways
Preempt Production Incidents with Deterministic Testing: Antithesis simulates all the things that could go wrong with your distributed system in a deterministic environment. This eliminates the reproducibility headaches that plague most large-scale system testing.
Autonomous Search: Rather than you writing a million test cases, Antithesis actively seeks out new and interesting behaviors in your software on its ownāthen shows you precisely how they happened.
Already Helping Startups & Enterprises Alike: While initial focus is on larger customers, small companies (like Turso DB) have also used Antithesis to secure bold projects like total rewrites.
Potential for AI Synergy: With GenAI producing ever more codeāoften from less expert devsādemand for deeper, more robust testing only grows. Meanwhile, Antithesis sees opportunities to use AI to generate novel test inputs and fix discovered bugs in a closed loop.
Future Expansion Beyond Distributed Systems: Though best known for finding fault tolerance bugs, Antithesisās approach can be leveraged to test a wide range of applicationsāfrom mobile apps to games to websites.
In this conversation, Will explains how the companyās unique technology was inspired by the āimpossibleā distributed database days at FoundationDB, how intelligent search differs from random chaos testing, and why he believes AI-driven dev tools will benefit from rock-solid verification.
Letās dive in ā”ļø
Read time: 8 mins
Our Chat with Will š¬
Hey Will - welcome to Cerebral Valley! Could you give us a bit of intro on yourself and what led you to found Antithesis?
Iām Willāa software engineer by background who never really planned to start a company, but here I am. A lot of what inspired Antithesis came from my time at FoundationDB (acquired by Apple in 2015). We built a distributed database with ACID transactions and high fault toleranceāpeople said it was impossible because of the CAP theorem, but we did it. The key was this sophisticated autonomous testing system we created. It simulated arbitrary failure conditions, searched the entire āstate spaceā to trigger weird bugs, and it let us deterministically reproduce any situation. That was a total game-changer. After Apple, I worked on Google Spanner and noticed they didnāt have anything comparable. We realized there was a huge opportunity to bring that style of testing to the broader world.
A ton of my colleagues here are from FoundationDB, including my co-founder who used to be my boss there. Come to think of it, my boss at Apple now works for me as our VP of Engineering, and another former boss is an investor. I guess I have good relationships with bosses! But seriously, the vision is to free devs from writing endless test cases by letting an intelligent system break their software in a reproducible environment.
What exactly is Antithesis for the uninitiated developerālike an elevator pitch?
We flip the usual approach to testing on its head. Normally, you write tests for specific cases you think might matter, then hope you covered enough ground. In practice, you deploy to production and discover all kinds of insane edge scenarios you never imaginedārouters delaying packets, machines shutting down mid-request, user input with 2^8 characters.
Antithesis starts from the end goal: you specify what your software is supposed to do or not do (e.g., āDonāt crashā), and we systematically explore how to break that rule. We do this by injecting weird environment faults, bizarre inputs, or exotic usage patterns. Because we run everything in a deterministic environment, once we find a bug, you can re-run that exact scenario. No more āworks on my machine, breaks on yours.ā Itās like combining property-based testing with advanced fault injection, but at scale for big real-world apps.
You mentioned a deterministic state space that you rigorously test through. What does that mean, exactly?
Conventional software is inherently non-deterministic. In real life, a program can spawn threads that run in arbitrary orders, send and receive network traffic with random delays, check the time at different moments, or generate random numbers. That means bugs often appear sporadically, or only 1 in 1,000 times (but Murphyās Law means itāll happen at the exact wrong moment in prod).
We built a hypervisor that forces your whole system to be deterministicāthread scheduling, network responses, everything. If a thread is about to run, we decide the scheduling in a repeatable way, so if we see a bug, we can replay the entire scenario. That transforms debugging: no more ephemeral āheisenbugs.ā Our approach systematically covers a huge range of possible interleavings and fault conditions.
Is your system primarily geared toward enterprise customers, or could smaller shops or individuals use it too?
Right now, weāre mostly oriented toward enterprise, because thatās where we see the biggest immediate ROIālarge companies want to reduce outages and production firefighting. We do have some small startups as well, including pre-seed ones, but the overall product can be quite expensive in its current form. Over time, we plan to make a more polished, self-serve version with a cheaper or free tier. Thatās definitely on the roadmap.
What are the key output metrics you look for to determine success, and do you have any success stories that show a drastic change in those metrics?
Number one is reducing production incidents and outages. If youāre shipping fewer bugs and spending less time on firefighting, thatās huge. We had one customer tell us that their āsupport people were getting boredā, and that made my day.
Another is developer productivityāhow much time are teams spending on writing and maintaining tests? Or how much are they wasting triaging and investigating weird, non-reproducible issues? The latter task especially tends to follow on the most senior and valuable members of the team. When we free them up to write features instead, thatās a huge win.
Then thereās the āfrontier of whatās possibleā: can devs tackle projects they never dared attempt before?Turso DB is a great example. They wanted to rewrite SQLite from scratch, but initially thought it was too riskyālike, how do you test something that big and complicated? After working with us, they felt safe to do it, because we systematically hammered on the new code until it was stable. Thatās the kind of āfrontierā effect we love seeing.
This sounds like a novel approachāwho do you see as competition or the āincumbentsā in the testing space?
Thereās not much direct competition doing exactly what we do. The biggest ācompetitorā is often a homegrown system at large companies, usually some janky fault-injection approach they built internally. Then you have chaos testing, popularized by Netflix, which is basically introducing random disruptions in production. For smaller or stateless things, thereās fuzzing or property-based testing, but those rarely scale to big distributed apps.
We do see some teams building their own deterministic simulation, but thatās an enormous effort. For most, itās easier to go with a vendor approach. So in short, thereās no one else systematically combining deterministic simulation, intelligent search, and environment fault injection the way we do.
You mentioned a moat around the insights you gain from multiple customersā code. Can you share any interesting or surprising insights from that?
Every new customer adds more diversity to our ātraining corpus.ā That means weāre less likely to overfit on just one type of system. We do see broad patternsālike how pure uniform randomness is actually not that useful. If you simply drop 5% of packets or randomly pick functions to call, you might never hit the weird corner where you call function A 100 times in a row without an intervening call to function B.
Structured randomness is more powerful. Weāll do, for instance, periods of complete disconnection, then normal traffic, or sequences that call function A repeatedly in case it contains a memory leak that gets cleaned up by function B. This can reveal deeper bugs that uniform sampling would never touch. Another big theme is that real-world usage is rarely a simple distributionāso we design our search to systematically produce āpockets of chaosā instead of random mild chaos everywhere.
Where do you see software development going in five or ten years if this new paradigm of testing takes off?
Ideally, devs can focus more on high-level logic and less on writing a million test permutations. We want a world where you specify your softwareās constraintsālike āit shouldnāt crash, it should maintain these invariants, it should respond within X timeāāand let an intelligent system handle writing the test cases.
AI dev tools come into play here too. If an AI can generate large amounts of code quickly, there will be more code with potentially more bugs. But also, if we can integrate a strong testing approach behind the scenes, that code can get self-verified. Maybe you ask a language model for new functionality, it writes some code, Antithesis runs a state-space search, finds the broken scenario, then the AI fixes it, and so on. That might let us scale software creation far beyond current limits, because we can systematically ensure correctness.
Speaking of AI, do you see generative coding tools shifting Antithesisās roadmap, or is Antithesis already using AI?
In the short term, itās good for us that so much new code is being churned outāsome of it by less experienced devs or by LLMs. That means more potential bugs to find. On a deeper level, we see big possibilities in integrating with coding agents: you give them a spec, they produce code, we automatically test it in a closed loop, and only return results once the code passes. Thatās definitely an idea weāre exploring.
Weāre also experimenting with using AI to generate test harnesses and weird usage scenarios. If it calls an API āincorrectly,ā well, your program shouldnāt crash anyway, so thatās beneficial. A higher ātemperatureā can lead to more interesting test inputs. Weāre seeing promising early results from letting AI produce synthetic calling code that humans wouldnāt typically think of.
Aside from AI, how will Antithesis evolve over the next 12 months?
Our biggest push is reducing latency to get results. We used to run in a batch mode, where youād submit code and a few hours later get a bug report. Now weāre moving toward a streaming model, so as soon as we find a bug, you get notified. This makes the feedback loop much tighter and more dev-friendly.
Weāre also broadening the problem domain. Right now, weāre best known for unearthing fault tolerance issues in distributed systems, but the same fundamental approach can test websites, mobile apps, even games. Our architecture is general enoughāit just needs a bit more productization. That expansion will open the door to a much larger market.
What can you tell us about the culture at Antithesis, and what roles are you hiring for?
Weāre extremely collaborativeāeveryone says that, but we really mean it. We often spend more time talking through design choices than coding, because once we figure out the correct approach, implementation goes faster. Weāre also an in-person company, located in D.C. Thatās unusual in the startup world, but it works for us, especially since we do a lot of deep systems work that benefits from real-time interaction.
We hire folks across a broad spectrumāfrom kernel hackers to front-end/UI people to machine learning researchers. If youāre excited about deterministic simulation, advanced testing, or pushing the boundaries of software reliability, we want to talk. D.C. isnāt the most common tech hub, but thereās plenty of talented engineers here, and we love building a strong presence in the city.
Conclusion
Stay up to date on the latest with Antithesis, learn more about them here.
Read our past few Deep Dives below:
If you would like us to āDeep Diveā a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.