- Cerebral Valley
- Posts
- DeepSource - The Vulnerabilities Every AI Tool Missed
DeepSource - The Vulnerabilities Every AI Tool Missed
Plus: CEO Sanket Saurav on why nobody has a mental model of their code anymore, the day he accepted static analysis was dead, and the 7-year infrastructure moat that just outperformed every major player in AI code review...

CV Deep Dive
Today, we're talking with Sanket Saurav, Co-Founder & CEO of DeepSource.
DeepSource is the AI code review platform that uses a hybrid analysis engine — combining seven years of static analysis infrastructure with an AI review agent — to catch security vulnerabilities and code quality issues that no other tool can find. The company just launched AI Review, the biggest change to their platform since founding. Their hybrid engine layers an AI agent on top of 5,000+ static analyzers, data-flow graphs, and taint maps — giving the AI structured access to how code actually behaves before it starts reasoning.
On the public OpenSSF CVE Benchmark, DeepSource leads on overall accuracy at 82.42% — ahead of OpenAI Codex, Devin, Cursor BugBot, Greptile, Claude Code, CodeRabbit, and Semgrep. All raw data, methodology, and judge verdicts are published publicly. DeepSource has over 1 million repositories connected and serves enterprise customers running thousands of scans per day — including teams in government, health tech, and regulated environments. The company has done this on $7.7M in total funding. Five days before DeepSource's launch, Anthropic shipped Claude Code Security and sent cybersecurity stocks crashing. Sanket's thesis: neither static-only nor AI-only is the answer. Hybrid analysis is the future of code review.
The new Team Plan includes unlimited static analysis, code coverage, secrets detection, and $120/year in bundled AI Review credits per contributor. 14-day free trial, no seat minimums.
In this conversation, Sanket walks us through the moment he accepted that static analysis alone was dead, explains why LLMs get “distracted” when reviewing code without a baseline, and makes the case that DeepSource's seven-year infrastructure moat is exactly what AI needs to review code reliably.
Let’s dive in ⚡️
Read time: 9 mins
Our Chat with Sanket 💬
Sanket — welcome to Cerebral Valley! You started coding on your cousin's computer in Bihar, went through multiple startups, Y Combinator, and you're now seven years into building DeepSource. Give us the origin story — what problem did you see in 2018 that made you want to build this?
We didn't have a computer at home, so I learned on my cousin's machine when I was about eight years old. Started with games like Road Rash, then Paint, then BASIC, then HTML. Long way from there to here.
When we were starting DeepSource, my co-founder Jai and I had worked together at my previous startup for a few years. We were looking at problems we were facing every day rather than scanning the market — solve problems you have for yourself. Code quality and code security was something we'd learned the hard way. I was 21, fresh out of college, first startup. A lot of the lessons around engineering and building products, we learned by getting burned.
One of the things we realized is: how do you actually make sure you're writing good code? This was 2018 — pre-AI, pre-automation, very little automated tooling that people were using around their code. The idea was figuring out how to automate the things people need to automate when they want to make sure their code is secure, correct, and clean.
We started looking at existing solutions. We'd used some of these products before and weren't happy with the user experience — not the UI, but the actual workflow that developers have. DeepSource started with the thesis that we could build a new static analysis engine focused on speed, fewer false positives, and a better developer experience. Unlike a lot of other companies who were taking open-source linters and wrapping them in a web UI, we built our own static analysis engine and our own runtime from the ground up. That gave us control on performance, control on false positives, and control on behavior — and then we backported all the open-source checkers so our users get the best of both worlds.
You took a bottoms-up approach when everyone else was selling top-down. How did that philosophy shape the company's DNA, and how does it show up in the hybrid product you've just launched?
When we first started, people were very skeptical of automation in their code. We built this thing called Autofix back in 2020. It was a novel idea back then. We'd gone through YC — Winter 2020 — and in March 2020 we launched with a static analysis engine that could find issues, with support for Python, Go, and JavaScript.
Then we thought: why can't we automatically fix some of these issues? We built a static analysis-based remediation engine using concrete syntax trees, writing fixers for each rule we could catch. You could go to DeepSource, see an issue, click on it, and we'd automatically generate the fix — show you the patch, and you click commit. That was the first time anyone in the industry was doing that.
The con was we could only fix about 25% of issues — it was completely deterministic, completely handwritten. But the concept was novel.
I distinctly remember during YC, we were showing people Autofix and the CTO of a company would say, “How can I trust that you won't break my code?” And today you have hundreds of agents YOLOing with dangerously-skip-permissions. The culture shift has been dramatic.
But when the LLM coding models got better, we realized we could do autofix properly. We ripped out our deterministic engine and replaced it with an LLM-based agent. We made the product super easy to use. Sign up, connect a repository, run an analysis, see results in ten minutes. This was not possible with our larger competitors at the time.
You just launched AI Review, which you've called the biggest change to DeepSource since you started the company. Walk us through the moment you realized that static analysis alone was no longer enough. What was that signal?
It happened slowly, and then all at once.
Our core business was static analysis — code quality, code security, secrets detection. We have customers deploying DeepSource across thousands of developers in regulated environments, government agencies, health tech companies. Static analysis was serving them well.
But then our customers started telling us: “I'm adopting AI coding agents. My PR volume is growing exponentially.” Now, the volume alone isn't the problem — static analysis is 100% deterministic, you're not spending tokens, we can handle that. But what started happening is that the number of PRs and the amount of code being generated is increasing, nobody has a mental model of everything they're writing.
That started happening slowly over the last year and a half, and then starting in December, all at once — when Claude Code got into the zeitgeist. Serious organizations like Ramp and Stripe released blog posts about having hundreds of agents running internally, doing work. Our customers were saying the same thing.
The light bulb moment was when people started saying: “the amount of code that we have to review is the primary blocker for us to actually ship.” There's a difference between writing a lot more code and actually shipping all of that. People are making 10x the pull requests with 10x the volume of code, but why aren't they making 10x new revenue, and why aren't their customers 10x happier?
We realized that static analysis objectively cannot solve this, because it's limited by the number of rules it can see. Now you have a whole bunch of unknown unknowns that you don't have rules for. Previously, the sea of unknowns was there, but since each PR was small, you had eyes on it. Now you're getting PRs of 500 files and you're just like, I can't do it.
That was a difficult day. That was the acceptance that static analysis-only is dead. The set of rules written by hand, by humans — it's just obsolete. And that's where we started thinking about the new product.
You're taking what you call a “hybrid analysis” approach, where the AI agent has structured access to data-flow graphs, taint maps, and 5,000+ static analyzers before it starts reasoning. Break down how this works in practice, from the moment a developer opens a PR.
The core workflow is still the same. You run your local loop — talking to your agent, asking it to do work. Once that loop is complete, you make a pull request. Because a few things haven't changed in the development workflow: every single change needs to go as a PR to your repository, pass your checks, pass your CI before it gets merged and deployed.
The moment you make a pull request, DeepSource runs all of its analyzers. And here's a key differentiation — instead of a blanket review, we have separate analyzers for different programming languages, each tuned to find problems in that language. We're not just running one LLM loop saying “here's your system prompt, go find issues.” We've built agents within each analyzer, tuned to finding problems for that specific programming language.
When the analyzers run, we leave comments on your GitHub pull request and show you checks. Python analyzer, JavaScript analyzer — you can gate on that. If our reviewer finds a security vulnerability, you can configure GitHub to automatically block the merge.
We also have a CLI coming soon that you can run locally, so instead of waiting to make a PR, you can run analysis locally before you push.
The most common failure mode you've found with LLM-only tools isn't wrong analysis — it's zero output. The model just skips the vulnerable code entirely. Why does that happen, and how does the hybrid approach fix it?
The LLM gets distracted.
You can solve that by using a top-of-the-line model like Opus with maximum reasoning. But that's not practical when you need to scan hundreds of lines of code on every PR — unless you're Anthropic and you don't care about your tokens. Even then, the model gets distracted by other things it sees. There's research showing that if code has a lot of stylistic issues or unrelated problems, the LLM tends to skip the main security vulnerabilities.
The second problem is cold start. If you start the LLM review from scratch — just give it a file and say “find security issues” — it doesn't know where to look. It goes in without a baseline. It can find issues, of course, but the behavior isn't deterministic. You can't trust it.
What we found empirically in our benchmarks is that LLM-only tools miss vulnerabilities not because they don't know about the vulnerability, but because they don't know to look for it.
If you actually ask the model, “Can you find this specific vulnerability in this code?” — of course it'll find it. But that's not the point. It's unknown unknowns. How do you know what to look for?
So we do two things with static analysis. First, we seed the LLM with everything we've already found. Instead of the LLM going in blind, we're already telling it: “By the way, we've found these security vulnerabilities in this code.” That provides a baseline, but it also indicates possible failure modes in the codebase — because a lot of things are related.
Think about it like this: if you invite me to your house and your living room is a mess — clothes lying around, things out of place — I see the mess, but I also make a second-order observation about the kind of household this is. In code, if we've statically found SQL injections or XSS vulnerabilities, that doesn't just tell the LLM what's already broken. It gives the LLM signal about where to look next as well as an idea of kinds of problems this codebase is likely to have.
Second, we create what we call “stores” — different representations and metadata about the code generated by static analysis: dependency trees, data-flow graphs, control-flow graphs, sources and sinks for taint analysis. We expose these stores to the AI agent as tool calls. Instead of the LLM needing to grep through your entire codebase, burning tokens and growing its context window until it gets confused, it can make one tool call and figure out the entire data flow. One tool call to determine if a function is interfacing with user input. One tool call to trace how data moves through your application.
If you summarize this entire conversation into one line: using static analysis, we are making it easier for an AI model to review code — so it gives you better results.
On the OpenSSF CVE Benchmark, DeepSource hit 82.42% accuracy — ahead of every other tool you tested. You also found vulnerabilities that every single other tool missed. What do those edge cases tell us about where the pure-AI approach falls short?
First, a caveat: our benchmarks use Opus 4.5, not Opus 4.6, the latest model. We'll update those. And for Claude Code Security — we don't have access to the newly announced product; it's limited access. So we're not claiming we're better than Claude Code Security specifically. What we benchmarked against was Claude Code's /security-review command, which scans the changes on a branch.
That said, we can make deductions. Anthropic has specifically called out that Claude Code Security is not using static analysis — it's LLM-only, so our hypothesis still holds.
Our hypothesis, the one we're betting the company on, is this: static harness plus a top-of-the-line model will always perform better than just the same model alone. If Claude or OpenAI or whoever says “use this model for reviews” — our pitch to customers is: bring that model into our platform, and the static harness will still give you an edge.
The reasoning behind that is sound. We're making it easier for the model to review code. The infrastructure for that — the static analysis, the code intelligence stores, the 5,000+ checkers — that took us seven years to build. Writing an AI code review agent is not that difficult. You and I could sit down, fire up Replit, and build a prototype in half an hour. But the infrastructure underneath it — that's the hard part.
The Vulnerabilities Every Other Tool Missed
Across the 165 CVEs in the benchmark, DeepSource's hybrid engine caught critical vulnerabilities that all seven other tools missed entirely. Three stood out:
Open redirect via protocol-relative URLs (CVE-2017-16224): A static file server used the raw request URL in a redirect header without validation. An attacker could craft a request like //evil.com/%2e%2e — browsers interpret the // prefix as a protocol-relative URL, silently redirecting users to a malicious domain. Every other tool focused on path traversal issues in the same codebase. None recognized the redirect pattern.
XSS filter bypass via encoded control characters (CVE-2019-1010091): TinyMCE's sanitizer checked URIs against a regex for javascript: schemes — but URL-encoded control characters embedded in the scheme name (like java%0dscript:) caused the regex to fail while browsers still executed it. Four tools found zero issues. Two flagged unrelated bugs.
Type system manipulation via prototype chain (CVE-2019-19507): A JSON validation library used constructor.name for type checking. An attacker could forge this property with { constructor: { name: "Array" } }, making the validator believe a plain object was an Array and bypassing all type-based validation. Three tools found nothing. The rest flagged unrelated issues.
The pattern: DeepSource consistently catches vulnerabilities that require multi-step reasoning about how attacker-controlled data flows through language-specific runtime behaviors. These are the classes of vulnerabilities that cause real-world security incidents — and they're the hardest to find.
You published all your raw benchmark data, methodology, and judge verdicts publicly. Most companies don't do that. Why the transparency?
It's truth-seeking for ourselves. We spent a lot of time building static analysis, and now static analysis-only is dead. So where does that leave us? We needed to prove — to ourselves first — that the hybrid approach actually works.
We ran the benchmarks and the data came out well. If you dig into the benchmarks page, there's a table that transparently shows all the metrics. In some cases we rank lower on individual metrics. In some cases another tool ranks higher on recall or precision. But on the metric we think matters most, which is accuracy, we're at the top, and we've justified why we think accuracy is the right metric.
We haven't heard anyone counter us. I think that's the benefit of making the raw data public. A lot of companies publish benchmarks where you just have to trust them. Nobody really knows. We didn't want that.
There's a lot of copium in the market right now. If you go on LinkedIn, you see founders of legacy security companies saying, “We don't even know how this works, this isn't publicly available” — clutching their pearls. People are in denial that this is going to disrupt them. I 100% agree that AI can do better code review than any human being and any static harness alone. We didn't lose customers, but I know we will if we don't evolve. So we're betting the company on the hybrid approach.
DeepSource has over a million repositories connected on $7.7M in total funding. That's an extremely capital-efficient path. Has the hybrid approach changed how you think about the business?
Having static analysis does help us keep our margins better. Instead of burning tokens on things the LLM doesn't need to do — exploration, grepping, building context — we handle all of that deterministically. I'd rather the agent in the loop spend tokens on finding issues and making the deductions that static analysis can't, rather than doing things that can be done statically, faster and cheaper.
Here's the dichotomy. On one side, we're saying static-only is dead. On the other side, we're investing more in static analysis to build better harnesses and better tools for our AI agent. Static analysis-only is dead. But static analysis as a technology — we are fully behind the value that it brings. We're hiring people to work on the agent side, but also to build better static analysis tools, like reachability analysis for third-party dependency scanning. These things are deterministic and can be done statically, much faster than an LLM.
Your new pricing bundles AI Review credits into the Team Plan at $120/year per contributor. How are you thinking about pricing when every tool in the market is racing to add AI?
We've been priced at $30 per month since day one. Pay annually and you get a 20% discount, which gets you to $24. What we did was build a blended model: you get full platform access for the same price, and we subsidize the token cost — $120 per year, which translates to $10 per month, per developer, for AI Review credits.
The key value is this: if you have a budget of $100 for AI analysis and you burn through it, your static analysis keeps working. You still get coverage, still get reporting, still get all the other features that none of the pure-AI players currently offer. That's the better deal.
For enterprise, you can run DeepSource entirely on your own infrastructure — GCP, AWS — and bring your own API key. A lot of our enterprise customers have special deals with model providers where they get subsidized tokens. So you're in control.
You've built DeepSource across Bengaluru and San Francisco with a small team. With the hybrid approach requiring deep expertise in both static analysis and AI, what does the ideal DeepSource engineer look like?
Historically, we've hired people who are interested in program analysis — which is a very niche area. Not everyone wants to work on static analysis. But now a lot of work has moved to the agent side. Culturally, we're a small team, high accountability, freedom to work on what you want. Since we have analyzers across different programming languages, you get to explore a lot.
We're hiring for engineering and GTM, both in San Francisco. On the GTM side, we're looking for people with experience in early go-to-market for dev tools. It's an inflection point for the company — we've got something that works, and now we need to scale it.
What's the most contrarian thing you believe about code security right now that most people in the industry would disagree with?
A lot of people are still clutching their pearls. There's a lot of copium in the market. Legacy security companies have built static tooling for 10, 15 years and have a lot of revenue. If I tell them that's going to go away, of course they'll push back.
But if you take a step back — the way that people code has objectively changed, almost overnight. It happened slowly and then it happened all at once. That changes the expectations of a code security tool. You're not in a world where a code security tool is just a helper to a human reviewer anymore.
It used to be okay — in the end you have an AppSec team, in the end you have a senior engineer looking at it. But that world doesn't exist anymore, because you are never going to have enough people who can humanly, possibly look at all the code and review it with their own sanity intact. Everything that was built is obsolete unless you rewrite the entire product for a world where nobody has a mental model of what they're building. It's a brave new world.
Last one — what's next for DeepSource? You mentioned an MCP server is coming. Where does the product go from here?
We're leaning into making products and services that naturally work with coding agents. Right now, people run a loop where they ask their agent to keep doing something until it's done. What we want to do is bring that same experience for ensuring code quality and security — build feedback loops with the hybrid agent that you can plug into your coding agent, so it can iterate on the feedback automatically.
We're currently focused on code review as the umbrella term — code quality and security. But we also want to do the same for third-party dependency scanning. We already have a solution that's static-heavy, but we're going to make that into a hybrid agent as well. Drop this tool into your repository, and it'll iteratively figure out all your third-party dependencies, determine which ones are insecure, figure out which insecure ones actually matter for you to upgrade, and help your agent automatically do that.
If agents are writing most of your code, it makes sense to optimize the review process for agents too. We've been building for humans. Now we're building for the agents — and the humans overseeing them.
DeepSource is hiring for engineering and GTM roles in San Francisco. Get started with a 14-day free trial on the Team Plan. Check out the OpenSSF CVE Benchmark results and the full launch announcement.
Stay up to date on the latest with DeepSource, follow them here.
Read our past few Deep Dives below:
If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on X or LinkedIn.