Cerebral Valley
Posts
Build specialized RAG agents with Contextual AI 🌐

Build specialized RAG agents with Contextual AI 🌐

Plus: CEO Douwe Kiela, one of the founding pioneers of RAG, on unlocking value in enterprise AI...

January 28, 2025

CV Deep Dive

Today, we’re talking with Douwe Kiela, Co-founder and CEO of Contextual AI.

Contextual AI is helping enterprises build specialized RAG agents to support expert knowledge work, especially in technical domains and regulated industries. Founded by Douwe Kiela - one of the pioneers of RAG from his time at FAIR - and Amanpreet Singh, Contextual AI enables organizations to build specialized RAG agents that can handle complex, high-value use cases across large-scale structured and unstructured corporate data - on an end-to-end optimized platform dubbed “RAG 2.0”.

Last week, Contextual AI announced its platform is now generally available. The company also published a set of RAG-QA Arena benchmarks showing it outperformed leading models such as GPT-4o and Claude-3.5-sonnet in end-to-end accuracy. Douwe and the team have optimized Contextual AI’s platform to have the most impact in highly regulated industries such as finance, technology, and engineering - where accuracy is critical and complexity is abundant.

In this conversation, Douwe shares how Contextual AI evolved from his foundational work at FAIR and Hugging Face, the technical challenges of scaling AI for millions of data points, and why specialization is critical for unlocking the next wave of value in enterprise AI.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Douwe 💬

Douwe - welcome to Cerebral Valley! First off, give us a bit about your background and what led you to start Contextual AI?

Hi there! I’m Douwe Kiela, CEO of Contextual AI, and I’m also an adjunct professor at Stanford—that’s my side job for fun. Before this, I was Head of Research at Hugging Face, and prior to that, I spent over five years at FAIR, Facebook AI Research. I joined FAIR relatively early, back in the early days of the AI revolution, when everything was still very research-oriented.

One of the things I worked on there that I’m known for is Retrieval-Augmented Generation (RAG), which is the core method for making Gen AI work effectively with your data. RAG has evolved significantly since then, and Contextual AI was really founded during my time at Hugging Face. My co-founder and I both moved from FAIR to Hugging Face, and after ChatGPT’s release, we could see the world was starting to view Gen AI very differently. Everyone was understandably excited—it’s a transformative technology.

At the same time, though, there was a lot of frustration because the technology wasn’t quite ready for prime time. We knew that RAG would be a critical part of solving that problem. Having co-developed the original RAG alongside some brilliant collaborators, we also knew there was room to improve. That’s the foundation of our company: what we call RAG 2.0.

It’s our proprietary technology, and it outperforms the original RAG with higher accuracy, reduced hallucinations, and better attribution—all the things you care about when deploying these systems. These capabilities are especially important in industries where the tolerance for errors is low, like finance, technology, engineering, and professional services. In regulated sectors, you need audit trails, enterprise-grade security, and compliance. All of these requirements are met by our Contextual AI platform.

How would you describe Contextual AI to an AI engineer or enterprise who’s slightly less familiar with what you do?

The Contextual AI platform is a way to build specialized RAG agents tailored to your specific use cases. It really shines in scenarios where there’s a lot of data that can be noisy, and you’re trying to solve complex problems on top of that data.

To draw a contrast - if you want a chatbot to answer straightforward questions like, “Who is our 401k provider?”, some of our competitors are probably good enough for that. But if you have more specific questions that require specialist knowledge, that’s where Contextual AI stands out.

We focus on targeting knowledge professionals. Our specialized RAG agents are just much better for solving these kinds of problems.

We ran industry-standard benchmarks for every major component of a RAG system and came out on top across the board. Contextual AI beats out frontier models like GPT-4o and Claude 3.5 Sonnet on end-to-end accuracy (RAG-QA Arena), document understanding (OmniDocBench),… x.com/i/web/status/1…
— Contextual AI (@ContextualAI)
3:26 PM • Jan 22, 2025

Who are your customers today? Talk to us a little bit about the evolution of your earliest users up to your most recent announcement.

When we started the company, we worked closely with design partners. It’s kind of the standard play when you’re an enterprise startup. We collaborated closely with these companies to develop the platform.

With the announcement we made two days ago, we’ve officially made the platform generally available for the first time. It’s the product of all the work we did with our design partners, and now it’s accessible for everyone to use.

The verticals we focus on—finance, technology, engineering, and professional services—are really a reflection of the design partners we worked with. These are areas where we’ve seen a lot of traction and differentiation, where existing solutions just aren’t as good as we are.

You’ve built Contextual AI on the concept of RAG 2.0 after pioneering the original RAG. Could you give us a top-level sense of the difference between the two?

When we worked on the original RAG, the actual research question was: how can we jointly optimize the components? How can we make sure that the generator—the RAG or the language model—works well together with the Retriever?

Since then, I think a lot of folks have stopped thinking about the optimization problem and have just gone straight to in-context learning. So, you just take off-the-shelf embeddings and then an off-the-shelf language model, which is good enough for building demos and maybe answering questions like “who’s our 401k provider?” But it’s not good enough for solving specialized problems.

With RAG 2.0, we’re really saying that a modern RAG pipeline is quite a complicated piece of software. There’s a lot happening there—document understanding, extraction, handling structured and unstructured data, graph data, API calls. All of that flows through some retrieval component, then often through a re-ranker and some filtering, then to the language model, and finally to post-processing. That entire system is what solves the problem.

So, you can have an amazing language model, but if it’s not properly contextualized—hence the name of the company—you’re not going to get good results. Our starting point for RAG 2.0 is the observation that it’s a system. If you optimize that entire system using machine learning, you can get much further. That’s really been borne out in our benchmarks and customer engagements.

Note that runner ups are different and disjoint on each of these benchmarks.
Seems like no-one apart from Contextual is even solving enterprise AI as a vertical end-to-end.
— Sheshansh Agrawal (@sheshanshag)
12:43 AM • Jan 16, 2025

When you talk about optimizing that system, what are some of the areas that you've seen being the most critical of importance to that difference between the two?

If you start with individual components that are state-of-the-art, that already helps a lot. As we show in our benchmark results from the launch, our document understanding model is state-of-the-art. It outperforms tools like LlamaIndex, Unstructured, and others.

Our retrieval component is also state-of-the-art, better than solutions from companies like Cohere, Voyage, and others. For the generation part, we focus heavily on grounded generation—not creative writing, but generation grounded in context—and our model performs better than OpenAI and Anthropic.

What’s interesting is that the number two in each of these categories is a different company, which tells you how challenging it is to build these end-to-end systems. On top of that, we do joint optimization of all the components, which significantly improves the end-to-end RAG accuracy.

In the RAG QA arena, we outperform models like Claude and GPT-4 by a pretty substantial margin.

You're working with some of the most highly regulated industries in the world, such as finance and healthcare. How receptive or not have they been to this sea change in the way that they will begin to access and interact with information?

This has definitely been changing over time. Initially, we often had to explain what RAG was. Now, we don’t have to do that anymore. Companies often come to us after they’ve already tried building their own RAG solutions. Building a demo is easy—you can probably get 90% on your demo evaluation set and think it’s good enough. But when you deploy it with actual users, it’s often very far from usable.

Bridging the gap from demo to production is where many companies are now, and that’s when they start encountering pain points. My message to your newsletter audience is this: if you’ve tried to bridge that gap and failed, come talk to us.

How do we achieve this level of accuracy, even at enterprise scale in production? The Contextual AI Platform integrates advanced document understanding, retrieval, and grounded language modeling into a unified system. This enables RAG agents to handle complex, domain-specific… x.com/i/web/status/1…
— Contextual AI (@ContextualAI)
4:16 PM • Jan 15, 2025

We've seen a lot of excitement around AI agents and the idea that autonomous systems will be able to complete a lot of different tasks. How are you thinking about AI agents in the context of RAG 2.0 at Contextual AI?

2025 is undoubtedly the year of agents. Everyone agrees it’s the logical next step for these systems, though there’s disagreement on what agents truly mean. Some argue that an agent must be able to take actions that change the state—like generating insert queries for SQL rather than just select queries. I think that’s a bit misguided.

An agent is essentially something that performs test-time reasoning and interacts actively with both structured and unstructured data. In standard RAG, retrieval is passive: you get a question, retrieve the information, and deliver an answer. With RAG agents, the process becomes much more active.

You get a question, think about what’s needed to answer it—maybe querying a SQL database, pulling data from an API, or searching an unstructured data source. Then you collect these pieces of information, reason through them, potentially retrieve more, and only then provide the final answer. It turns into a multi-hop reasoning problem.

The use cases this enables are vast. It’s a major leap from passive RAG, where you could make do with patchwork solutions. For more complex problems, you need something like RAG 2.0 and specialized RAG agents to truly unlock the potential.

What has been the hardest technical challenge around building Contextual AI into the platform it is today?

What surprised me, especially coming out of Facebook and to some extent Hugging Face, is how immature the AI infrastructure technology stack is—both for training and inference. Getting these systems to work at scale, like processing millions of PDFs in production rather than just a few in a demo, requires significant infrastructural changes. The existing solutions simply aren’t good enough for that level of scale and quality.

As a result, we had to build a lot of our own infrastructure to meet these demands. In short, the immaturity of the technology stack forced us to invent many of the components ourselves.

How do you plan on Contextual AI progressing over the next 6-12 months? Anything specific on your product roadmap that your existing customers are excited about?

A lot of our focus is on building out the core capabilities we already have and making them even better. Multimodality is particularly exciting—for instance, having chart understanding out of the box, not just for a single PDF but for your entire company’s knowledge base. Being able to find relevant multimodal data and apply agentic RAG agents on top of that is incredibly powerful.

Another area we’re focused on is the intersection of structured and unstructured data. Companies often have a lot of structured data with BI processes built on top, but they also have a wealth of untapped unstructured data. Finding use cases at the intersection of these two types of data holds immense value for virtually every large company. We're really excited to focus on solving that problem.

We're proud to see former Meta AI researchers Douwe Kiela and Amanpreet Singh develop new technology that leverages Meta's Llama models to push the boundaries of what's possible in enterprise AI and RAG systems. Congrats to the Contextual AI team on their GA launch!
— Meta for Developers (@MetaforDevs)
10:04 PM • Jan 15, 2025

Lastly, tell us a little bit about the team and culture at Contextual AI. Are you hiring, and what do you look for in prospective team members that are joining?

We tend to hire very smart, driven people. It's a bit of a cliché to say "A players," but really, it's about finding people who are analytical, curious, and have a scout mindset—always trying to discover something new. It’s a rare opportunity to work at the frontier of technology to the extent we’re doing it.

Research is core to our DNA, and that gives us a significant edge in terms of speed over time. It required more upfront work—we trained our own grounded language models instead of relying on OpenAI or others. This meant we started out slower than some competitors, but now we have a highly differentiated foundation.

That foundation enables us to experiment rapidly with research ideas across the entire RAG pipeline or agentic approaches to RAG, and then quickly productionize those ideas. One of my main focuses with the team is maintaining feature velocity, where the “feature” isn’t just a minor UI tweak but the actual research and models we expose to our users. That’s the kind of innovation that excites me.

Our team has a large research component, and our product folks have been in AI for a long time. For example, our VP of Product was one of the original product people behind Amazon Bedrock, and our Head of Security was one of the first security engineers at Snowflake. That gives you an idea of the caliber of people we have and how they contribute to the overall picture—enterprise-grade expertise combined with cutting-edge research. Combining those two things makes this a pretty special company.

Anything else you’d like our audience to know about the work you’re doing with Contextual AI?

The main takeaway for me is that specialization is going to remain critical for a long time. Especially when focusing on high-value use cases within companies, specialization is what makes these systems work effectively. These use cases often involve professionals with graduate degrees and years of experience. Enhancing their efficiency and productivity adds tremendous value to the global economy, and achieving that requires specialized solutions.

This is where the concept of a RAG agent comes into play. It's an agent that needs to operate effectively on top of your data, and to tackle the most valuable use cases, specialization is essential. That’s exactly what you should turn to Contextual AI for.

Conclusion

To stay up to date on the latest with Contextual AI, learn more about them here.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.

Build specialized RAG agents with Contextual AI 🌐

Plus: CEO Douwe Kiela, one of the founding pioneers of RAG, on unlocking value in enterprise AI...

CV Deep Dive

Our Chat with Douwe 💬

Conclusion

Join Slack | All Events | Jobs