Cerebral Valley
Posts
Exa (prev. Metaphor) aims to reshape web search 🔍

Exa (prev. Metaphor) aims to reshape web search 🔍

Plus: CEO Will Bryk on Exa's vision, team and the rebrand.

February 01, 2024

CV Deep Dive

This week, we’re featuring Will Bryk, the Co-Founder of Exa.

Exa, previously known as Metaphor, is a seven-person startup looking to transform web search. Founded by Will Bryk in 2021 as a consumer search engine, Exa is now offered as an API used by thousands of developers. Their goal is to organize all knowledge on the Internet so that businesses can integrate it into their applications.

Will walks us through all-things Exa, including his larger vision, what makes the team special, and the winding journey to today’s rebrand.

Let’s dive in ⚡️

Announcing big changes to Metaphor:
1. We’re releasing a suite of content retrieval features using new models trained for RAG.
2. We’re renaming to Exa to reflect our mission.
3. We’ve revamped our UI 🎉
Altogether, we're taking big steps toward "search over knowledge": 🧵 twitter.com/i/web/status/1…
— Exa (prev. Metaphor) (@ExaAILabs)
6:02 PM • Jan 26, 2024

Read time: 7 mins

Our Chat with Will 💬

Will! Today’s rebrand is a big day for you and the team. A lot of people knew Metaphor - how would you describe to us what Exa is?

In simplest terms, Exa is a search engine built for AIs to use. That’s what our product is. We currently have thousands of developers using our API to perform complex search queries not feasible with Google or Bing, and our long-term goal is to become the de-facto solution for hyper-accurate web search.

It took us a while to reach this point - especially because information tech has gone through multiple earthquakes since we first started two years ago.

Let’s backtrack - how did you end up working on Exa in 2021? What led you down the rabbit-hole of search?

The reason I started Exa was that for the past decade I’ve been obsessed with ways of improving our information ecosystem. In high school, I was telling anyone who listened that we need news that brings together different perspectives. In college, I built a website for students to crowd-source class knowledge because everyone was wasting effort struggling with the same ideas. Products like this have always seemed super important to me, and yet completely neglected.

But eventually, it became obvious that search is the most important portal to the world’s information. If you can improve search, you can have huge downstream effects on the information ecosystem. And that's a big reason why I'm super excited about Exa.

When we first started, people were posting on Hacker News saying “Google search is dying”, “Google isn’t like it used to be” and they would blame it on ads or SEO optimization. But the problem is, and has always been, the search algorithm. Google’s algorithm is not powerful enough to handle queries of any substantial complexity. And then with GPT-3, it became possible to handle these complex queries.

Nothing I'm tweeting here is original. Yet it's impossible to find all the places on the internet where someone had similar ideas. If we could, think of the potential for learning/friendship/innovation. An organized internet would spawn a revolution greater than the internet did.
— Will Bryk (@WilliamBryk)
10:48 PM • May 9, 2021

GPT-3 feels like the watershed moment for a lot of startups. How did it shape Exa’s conception and initial path?

Our true origin story started when GPT-3 came out in 2020 - and at the time it was this magical tool that understood text at near-human level. And then you had Google, which felt like it was static and hadn't improved in a decade. And so our insight was like, what if we took the power of GPT-3 and applied it to search? What would that feel like? It just suddenly became possible.

Today, we’re at the point where it’s almost possible to do human-level search over the whole Internet. Imagine if you told your smartest friend, “do research for ten years to generate a paper on a specific topic”, and they came back to you in a couple of seconds. This is now possible because we have automated intelligent systems that can work in parallel.

The only caveat is that the 1-second part might require a ton of compute - you still have to run GPT-4 over 100 billion documents. So the challenge becomes, how do we do that efficiently without spending so much on compute?

What was the reason for the rebrand from Metaphor to Exa? What did you feel was missing in the original direction you were taking?

The biggest difference was that we were originally redesigning search for human consumers. But Google works just fine for lots of queries that humans search. What had always been exciting to us was handling all the queries Google doesn’t even try to handle – the really complex ones.

And then ChatGPT came out.

Our big realization was that companies were now the ones who can utilize this completely redesigned search. Now, because of ChatGPT, we can enable a new ecosystem of workflows within every company’s AI stack. Every company is going to have native AIs deeply integrated into their systems, and those AIs are going to all need the ability to search in more complex ways than humans.

And with the name “Exa” - it means ten to the 18th power, which is in stark contrast to Google’s ten to the hundredth. A key problem with the internet right now is that we’re overwhelmed with information. But the amount of actual knowledge is actually a lot smaller. “Exa” is less than “Google”, but in this case less means better. We’re making a much more curated experience.

10^18 searches where Exa is better than Google:
Just kidding, but here are 10: 🧵 twitter.com/i/web/status/1…
— Exa (prev. Metaphor) (@ExaAILabs)
5:55 PM • Jan 31, 2024

Tell us about your users - who’s finding value in using Exa’s search API?

Most of our current users are startups - and that’s what we expected. If you look at when the Internet started, it was originally smaller companies that adopted it quickest, with the larger incumbents following on later. Though with AI, incumbents are moving faster than with the internet. So we also have some large companies using us.

User-wise, we’ve gotten a ton of inbound - over 4,000 developers have signed up to our API in the last few months. And they’re generally just looking for a high-quality search experience that they can't get from Google or Bing.

What were some of the earliest use-cases that worked best?

It’s silly, but simply getting what you ask for was the big early use case. A good example is a VC fund using Exa for market research - i.e a search for startups applying AI to law. If you type “startups applying AI to law” into Google, you don't get a list of those startups - you get a bunch of blog posts about AI and law, or LinkedIn profiles, or other irrelevant results. In other words, you get SEO’d into oblivion, because Google isn’t functioning like it was intended to.

But with Exa’s transformer-based algorithm, I can expect it to output results for exactly what I asked. This is what I, the VC fund, wanted all along. So that's the broad pattern we’re seeing across the companies using us - in 2024, they want high-quality knowledge that’s far past what “Google-level search” can offer them.

Yet another fascinating AI product:
metaphor.systems
It crawls the Web, but you can get a list of results via a ChatGPT-prompt. It does extremely well for when you're at a "loss of words" for what to Google for.
— Guillermo Rauch (@rauchg)
7:56 PM • Nov 6, 2023

What are other use cases you’re seeing with Exa so far?

At its core, Exa serves three broad use-cases - receiving ten results per query, a thousand results per query, or millions of results per query. We trained a model for search and it ended up being highly performant for each of these use cases.

For getting ten results per query, this is specifically for RAG. Imagine an AI writing assistant to help people write papers. ChatGPT needs to find relevant content from the web in order to recommend what to write next. The chatbot makes searches to Exa for papers and high quality blog posts with similar ideas and then ChatGPT integrates that knowledge into its answers.

For a thousand results per query - this is more for automated analysis. Businesses are using this feature to find every company within a space to sell to. You could take a company working on synthetic meats, paste it into Exa's API and get a list of the thousand similar companies working on synthetic meat that are most closely related. Exa then provides the text content of each result, so now ChatGPT has a thousand pages to scan. This is actually how we’ve found customers at Exa!

For millions of results per query, and this is actually for creating automated datasets. You could think of that as the following: we're taking the web and we're slicing it in a high dimensional space and giving you a chunk of the Internet. You could then use that chunk to fine tune a model for example.

One last thing to note is that each of these use cases would have sounded like fantasy two years ago. Without an LLM, how can you possibly do RAG? Or automate analysis? And what would you fine-tune? LLMs create new types of search needs. We may have started as a search engine for humans, but this entire time we were actually building a search engine for LLMs.

We’re excited to partner with @ExaAILabs - they’ve created the most advanced RAG-powered web search we’ve seen so far. 🌐🔥
Unlike web search for humans, @ExaAILabs is tailor-made for LLMs, returning the most relevant highlights through custom chunking / extraction / retrieval.… twitter.com/i/web/status/1…
— LlamaIndex 🦙 (@llama_index)
10:38 PM • Jan 26, 2024

On a technical level, what makes Exa’s search API better than SERP or Bing?

We’re the first and only team to have created a web-scale neural search engine. Google used a mixture of keyword-based and neural search - where on the first pass, the algorithm filters by keywords, and then does neural re-ranking. Obviously, Google’s is a complicated algorithm that involves many elements, but at foundation it's fundamentally a keyword-based search algorithm.

We do neural end-to-end - which allows us to understand natural language like a transformer understands natural language. And what I mean by this is: we’ve embedded the Internet into Exa so that you can search by meaning and not by keyword. When you need to use meaning-based queries - i.e “startups applying AI to law”, or “designers with a cool retro style” - this is where we shine. We have a million-dollar GPU cluster that we train our own models on to help us do this.

Previously, Exa was compared to popular AI search platforms like Perplexity. Does Exa’s refocus put that to the side?

Startups like Perplexity are primarily innovating on the post-processing of search results - in essence, they’re processing results via an LLM (or multiple) to save you time and the need to look into the links themselves. What Perplexity is doing is definitely the right UI approach - however, we're not even focusing on UI anymore.

We’re innovating on the search results themselves - by crawling the web and using it to create our own search algorithm. We’re an API for businesses operating on the infrastructure level.

People think Google has a big moat in search, but that moat is actually a cage.
"They have all the search data" --> That means their algorithm is designed for past search patterns, not future ones
“They have all the click data" --> Which has optimized them directly into SEO… twitter.com/i/web/status/1…
— Will Bryk (@WilliamBryk)
7:17 PM • Dec 20, 2023

What would you say is the hardest technical challenge around building Exa?

Simply put, the hardest technical challenge is getting really precise search results. This means: search results that match exactly what you ask for, no matter how complex your query is. No one's trying to do that right now - literally no one.

For example, a query like: “startups applying AI to law that were founded by people who went to X or Y college and who have previous experience in Rust”. This is the hardest challenge but also our ultimate goal.

Tell us about Highlights, which is a new feature you’ve launched today along with the rebrand announcement.

A critical consideration when creating a search API for AIs is that AIs gobble up information much faster than humans. This means you want to avoid blocking them - either via not feeding them the right information, or by waiting too long to give them that information. This is what the Anthropic CEO was referring to when he mentioned “letting the compute flow”. Don’t impede the compute!

This is also why LLMs want to retrieve content from every result - not just the URL and the title, but the actual text. But often, the full webpage text for each result is simply too much, and you’re actually blocking LLMs by sending them this as then the compute doesn’t flow over the right information.

So to fix this, we launched Highlights - which, as the name describes, lets Exa instantly extract the highlights of any search. Let’s say you have 100 unique search results - we'll extract any number of highlights from each one of those results. If there are 3 highlights per result, you’ll get back 300 highlights. And that's super powerful.

The Highlights feature actually uses its own query independent from the main query. You can imagine searching for 100 research papers about COVID, and then having a highlight query that asks, “I want the most controversial elements of these”, and it gives you those parts of the papers.

This all happens in real-time using embeddings - essentially, we’re giving you real-time extraction of knowledge from across the web. This is Google snippets on steroids.

I was looking to make a personal website, so I asked @ExaAILabs for links that fit my background (startup founders, product background, writing, research).
Over half of the links in the first page were styles I actually enjoy. It'd be very hard to find them on Google Search… twitter.com/i/web/status/1…
— Charlene Wang (@hsinleiwang)
10:13 PM • Jan 31, 2024

It seems like there’s a heavy research element at the core of Exa. Tell us about how you balance that with the product?

We're a research startup, so we're always going to have a huge focus on new approaches to search. Plus, we need this to fulfill our mission, which is to solve search and organize the world's knowledge. We’re also working on the next version of our model and system, which requires heavy AI research.

Separately, I think a good way to think about whether a startup is actually a research startup is: if they don't do research, can they achieve their mission? Some software startups are fine without research, as there’s a lot of alpha in integrations and partnerships and great UI. On the other hand, text-to-video startups will not work if they don't constantly innovate. We’re in that camp - we have to do research to succeed.

Tell us about how the Exa team works internally - how do you balance research and product development?

We’re all currently doing a mix of research and engineering. Our research right now is very applied - like optimizing our special vector DB, or figuring out how to combine neural filters with keyword-based filters in a way that nobody’s done before because they haven’t had this problem at web scale. We eventually want to have a research lab to help us see around the corners and investigate ideas without immediate application. But we're a little early for that.

I really cherish our culture - it's a beautiful thing. We're like a family - we eat together, live together, and have the best time together, and that's very rare. In preparation for the Exa rebrand, our team was up literally all night - just because the energy was insane and everyone wanted to be up. Everyone cares about the technology and the mission. It doesn’t feel like work, and that's how great companies are formed.

Almost there!
— Will Bryk (@WilliamBryk)
3:59 PM • Jan 25, 2024

And are you adding to the team?

We definitely are - we want people who are super excited about building these types of products, excited about the mission, and who are really freaking good. When you’re a company trying to solve very hard problems, it's better to have one person who's amazing than a few who are solid. We want people excited by the concept of crawling 100 billion websites really fast or redesigning the vector DB using low-level Rust. We have a really high bar.

Right now we're looking for AI researchers - someone who can help us think of new architectures and implement them for search. And then secondly, we’re hiring a generalist engineer who can build out any of those systems we’ve talked about.

I’d also say that intelligence and persistence are way more important than experience. So if you only have some PyTorch experience, but you're a freaking beast coder, you’ll probably become a great AI researcher in a short amount of time.

Where do you see Exa going in the next 12 months? In December 2024, where do you want the company to be?

In twelve months. I want Exa to be basically a perfect search tool over certain domains. So that no matter how complex your search is, it feels like a magical search.

That means that when you want to find companies, you go to Exa without any doubt that it will give you what you’re looking for. When you want to find accurate news, you go to Exa. When you want to find recent papers, you go to Exa. If we can get there for those domains, that would be awesome.

After twelve months - once we've basically solved search over those categories - we’ll be ready to go and solve them for all categories in 2025.

Conclusion

That’s a wrap for our fourth Deep Dive of 2024! Follow Will on Twitter to learn more about his work with Exa and reach out if you’re recruiting.

Read our past few Deep Dives below:

2/25: KREA is building the next frontier of human creativity ⚡️
2/18: Julius is transforming computation with AI 📈
2/11: How USearch Reached 500k+ Python Downloads 🌐

If you would like CV to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter.