Cerebral Valley
Posts
Hyperspell lets you build your AI data pipeline in minutes, not months 🔧

Hyperspell lets you build your AI data pipeline in minutes, not months 🔧

Plus: CEO Conor on the opportunity within building RAG pipelines...

October 23, 2024

CV Deep Dive

Today, we’re talking with Conor Brennan-Burke, Founder and CEO of Hyperspell.

Hyperspell is building an end-to-end data pipeline solution designed to simplify the complex process of connecting and preparing data for AI applications. By focusing on Retrieval-Augmented Generation (RAG), Hyperspell enables companies to seamlessly integrate, index, and retrieve data for AI-driven applications in just a few lines of code. Hyperspell eliminates the need for teams to manually manage data pipelines, letting them focus on building their AI applications instead.

Conor and his cofounder Manu Ebert are helping startups and enterprises alike streamline their AI workflows, making data accessible and enabling faster deployment of AI-powered solutions. Hyperspell’s approach is empowering companies to solve complex data challenges while maintaining data security and privacy.

In this conversation, Conor dives into the origin of Hyperspell, the challenges of building RAG pipelines, and how they’re positioning the company to stay ahead of the curve as AI technology evolves.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Conor 💬

Conor - welcome to Cerebral Valley! First off, give us a bit about your background and what led you to found Hyperspell?

Hey there! So, I’m originally from western New York, and I started my career working at Checkr. Checkr is a company that builds APIs for background checks, working with some of the biggest tech players like Uber and DoorDash. My role there involved taking complex data sources and breaking them down into usable APIs. Before that, I spent time working in consulting at BCG, particularly in BCGx, their technology design and development unit, which focuses on building startups and software products.

This is actually my second company. Before Hyperspell, I was building an AI music startup. I met my co-founder, Manu, through the On Deck Fellowship. Manu’s got a deep background in machine learning, having been in the space for over 12 years. He previously sold an AI startup to Airbnb in 2017, where he built the first knowledge graph at the company.

We connected over a shared vision of wanting to eliminate busywork and use AI to replace tedious tasks. Initially, we spent about five months building an end-user AI application that could write documents, send messages, and compose emails. We had customers interested and excited about the product, but in the process, we discovered that we were spending a huge amount of time building the data pipeline for RAG (retrieval-augmented generation). This included everything from indexing and chunking the data, to summarizing and integrating various steps with different vendors.

We realized we were spending more time on the pipeline than on the actual end-user product. After talking to other companies building AI, we found they had similar struggles, and there wasn’t a comprehensive, end-to-end RAG pipeline available in the market. That’s when we decided to shift our focus and build Hyperspell, solving the problem for the broader AI community.

How would you describe Hyperspell to an AI engineer or developer who isn’t as familiar?

Hyperspell is an end-to-end data pipeline specifically built for RAG (retrieval-augmented generation). So, there are tons of tutorials on the web teaching you how to build a simple RAG application. However, everybody quickly learns that those don’t work well beyond some toy example, and even if the code seems easy, maintaining and scaling the infrastructure underneath is anything but. Building a production-ready RAG system takes months and decent expertise in natural language processing to deliver quality results.

Hyperspell reduces that entire process to just three lines of code. All you need to do is add a button that lets your users connect their accounts and data sources, and we handle everything in between, delivering the output via an API, ready for RAG.

Talk to us about your users today - who’s finding the most value in what you’re building with HyperSpell?

We're currently pre-launch, but we built Hyperspell for ourselves initially and have several companies already signed up for our private beta. To give you some example use cases, Motif, an AI startup focused on helping companies file patent applications, is working closely with us. They connect to a company’s data sources, such as Google Drive or GitHub, and gather the necessary information to help file patents. We’re working with their founder, Daniel, who mentioned that he spends most of his time on the data pipeline when his real expertise is in patents and legal work. Hyperspell frees him up to focus on the legal side by handling the entire data pipeline.

On the larger enterprise side, we’re working with a technology consulting company that helps businesses build RAG applications. They’ve struggled to hire experts in this new area of data integration and pipeline building. With Hyperspell, they can now focus on their customers — banks, insurance companies, and other large firms — while using us as the back end to handle their data needs.

Could you explain what you mean by RAG-as-a-service, in the context of Gen AI?

The key insight behind Hyperspell is simple: no matter how advanced an LLM gets, it needs data and context to function effectively. Whether you’re writing a document, email, or working on a project, the LLM needs to understand your world — who's involved, what projects are active, how you talk to different people. RAG is the most common method of providing that context. In essence, you first search a database (or the whole internet) for documents related to a user input, and pass on the relevant information to an LLM. Those documents could be anything from your Slack history, documents on google drive, emails, code, and tasks.

With Google Gemini's massive 2M token context window, do you still need RAG?
2M tokens fit 6 months of my email and will cost $2.50 for each call. So yes, for every serious use case, RAG is here to stay.
#GeminiAi#RAG#GenAI
— Manu Ebert (@maebert)
5:49 PM • Oct 22, 2024

With Google Gemini's massive 2M token context window, do you still need RAG? 2M tokens fit 6 months of my email and will cost $2.50 for each call. So yes, for every serious use case, RAG is here to stay.

In practice, maintaining this database is incredibly hard. Let me give you an example. Say you want to build a service that dynamically creates a project update on demand, without anybody having to actually write it — just based on conversations in Slack, emails, github and so on. You would have to handle authentication and permissions to these services, backfill vast amounts of data while worrying about rate limits, then continuously ingest data and keep it constantly updated, handle permissions and sensitive information, think about chunking and indexing, and even nasty things such as prompt injections.

Hyperspell does all of that for you right out of the box, so you can query all of your users’ data with one single API, or as part of your existing LlamaIndex or Langchain tooling.

RAG has become an extremely critical part of generative AI, and has received a lot of business interest. What sets Hyperspell apart from others in the space?

There are a lot of builders in the RAG space, and it's really exploded with incredible point solutions. For example, Reducto is excellent at understanding PDFs, Pinecone offers a powerful vector database, and Merge handles data connectors. These are all great, but they each tackle a small part of the pipeline, and it still takes an insane amount of work and expertise to make them all play well together at scale.

What we saw missing, and what we kept encountering in our own work, was the lack of a comprehensive, end-to-end solution. For many users, especially AI users, they don’t want to piece together multiple vendors to create a functioning pipeline, they want something that actually works so they can focus on building amazing products, not shuffling data around.

It’s similar to the shift that happened when AWS entered the scene. Before AWS, you had to handle physical servers, scaling issues, and managing infrastructure. Now, AWS abstracts all that complexity, enabling startups to build faster. Plaid did something similar in fintech, creating a connector layer for banks, which empowered the next generation of fintech startups that would simply not exist without Plaid. Hyperspell aims to do the same for RAG — abstracting the complexity of the data pipeline to let builders focus on what matters most.

There’s been an explosion of interest in multimodal and agentic workflows. How has that shaped the way you’re thinking about building Hyperspell?

In fact, when we were initially thinking about how to tackle busywork, we naively hoped AI agents would just come around and solve that. In reality, depending on the task, human’s doing even spend the most amount of time doing things — they spend the most amount of time looking up all the information they need to get that task done. We believe that for all but the simplest agentic workflows, agents still need real context, and that’s of course what Hyperspell provides. But at some point, we really want this to be a two way street. Our vision is to be the universal data layer, the universal infrastructure between data and AI. Right now, that's about reading and searching, but in the future, it’s also going to be about creating and doing.

The truth is that the internet wasn’t built for millions of agents running around, reading pages in milliseconds and clicking things at lightning speed. We can see Hyperspell being the universal connector that abstracts away not just the million different ways of reading and finding data, but also of interacting with the web and all its connected parts. But that’s still a while out.

What has been the hardest technical challenge around building Hyperspell into the product it is today?

We’ve already talked about a few reasons why RAG infrastructure is intrinsically hard. At Hyperspell, there are three additional hurdles. The first is the sheer breadth of data out there, and the fact that we’re still far from having established best practices in how to deal with that. Sure, an essay is easy enough to parse and index, but how about a busy Discord channel, with people talking all over each other? Tables in PDFs? Data that’s inherently relational? A lot of theory around how to make these efficiently lookupable still needs to be invented.

A different challenge, but also blessing in disguise, is that building a general purpose solution means we can’t take a lot of shortcuts that work if you know exactly what you’ll use the data for. However, that has also led us to really understand and appreciate the many ways RAG can be done, and we’ve built an incredible amount of expertise there.

x.com/i/article/1848…
— Hyperspell (@hyperspell)
10:42 PM • Oct 22, 2024

Finally, and that’s less of a technical challenge than a business challenge, privacy and security are top concerns, both for end users and enterprises. Many companies are understandably cautious about how their data is being used — where it's going, who's accessing it, and whether it's being sent to third-party services like OpenAI. We have full audit logs built in of course, but this is a relatively new field too, and iterating through lots of different ways to keep people’s data secure, such as homomorphic encryption of indexed data, and natural language permissioning — as a user, you can tell Hyperspell not to share any eg. financial projections, or things relating to your family, and we will categorize data accordingly and redact it.

There’s a lot of fear around AI and data, and we’re working closely with companies to understand their evolving needs and concerns, as this is as much a learning journey for us as it is for them.

How do you plan on Hyperspell progressing over the next 6-12 months? Anything specific on your roadmap that new or existing customers should be excited for?

Our v1 is really about providing a foundational layer that doesn’t exist in the market yet — a true end-to-end solution. Instead of taking three months to set up data pipelines, with us, it's three lines of code. You can bring your data in and have it ready for retrieval-augmented generation (RAG) right away. We believe there’s an entire generation of startups and builders that are going to be empowered by this and will be able to focus on building their end-user solutions instead of infrastructure.

One thing we’re incredibly excited about is letting developers query structured data with GraphQL by building a knowledge graph under the hood. Imagine you want to build an automatic personal CRM: you don’t just use Hyperspell to quickly find when your user last talked to a contact and what they talked about, but you can just as easily get a list of all contacts and relevant data in a single query, or all projects somebody is working on, and so on.

My co-founder Manu has a decade of experience in that domain and built the first-ever knowledge graph at Airbnb, and we’re pretty confident that will improve the quality of our RAG beyond what most people can build in-house.

Lastly, tell us a little bit about the team and culture at Hyperspell. How big is the company now, and what do you look for in prospective team members that are joining?

Right now, it’s just the two of us — myself and my co-founder Manu. We’re both repeat founders and have run large teams before, and are beginning to build out our team at Hyperspell.

This early on, it’s crucial to work with people who want to build towards a vision, not a deadline. For us, that vision is creating the ubiquitous and indispensable infrastructure for all AI powered apps, and if that sounds like a worthy ambition you should come talk to us. The amazing thing about building infrastructure is that very quickly hundreds and thousands of people interact with your product, which strikes the balance between terrifying and satisfying that keeps engineers going.

Much of the technology we’re using and building simply didn’t exist two years ago, so if you’re a passionate engineer but feel like you don’t have an AI background, there’s plenty of opportunity to learn on the job.

How we work together is as important to us as what we work on though. As a young startup we inevitably spend a lot of time together, in a fast moving, sometimes stressful environment and industry, and you’ll notice we invest a lot in our ability to communicate well with each other, care for each other, and constantly finding new ways of making it easy for everyone to be the best version of themselves.

Anything else you want people to know about Hyperspell?

We’re also excited about partnering with other AI infrastructure companies. If your organization helps customers store data and you want to enable them to build chatbots or AI applications on top of that data, we’d love to collaborate. Privacy, security, and data encryption are key priorities for us, so we're ensuring enterprise-level solutions that maintain those standards.

If you're someone inside a large enterprise looking to dive into AI but aren’t familiar with RAG, no worries—we’ve got you covered. With Hyperspell, you can create a chatbot in just three lines of code, and it’ll take minutes instead of months.

Conclusion

To stay up to date on the latest with Hyperspell, learn more about them here.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.

Hyperspell lets you build your AI data pipeline in minutes, not months 🔧

Plus: CEO Conor on the opportunity within building RAG pipelines...

CV Deep Dive

Our Chat with Conor 💬

Conclusion

Join Slack | All Events | Jobs