Cerebral Valley
Posts
Qdrant's GPU-accelerated vector indexing is here 🔋

Qdrant's GPU-accelerated vector indexing is here 🔋

Plus: CEO André Zayarni on how Qdrant's latest release is a huge step forward for vector search...

January 23, 2025

CV Deep Dive

Today, we’re talking with André Zayarni, Co-founder and CEO of Qdrant.

Qdrant is a vector search engine purpose-built to handle unstructured data at scale, providing developers with real-time search and semantic understanding for AI applications. Founded by André and his co-founder Andrey Vasnetsov in 2021, Qdrant is specifically optimized for performance and scalability - it’s built from scratch using Rust and leverages a custom filterable HNSW algorithm, which makes it unique relative to its competitors. Packed with features like integrated quantization, hybrid search, and support for billions of vectors, Qdrant is powering advanced AI workflows for developers, from RAG-based applications to recommendation systems and anomaly detection.

Today, Qdrant announced its launch of GPU-accelerated vector indexing, which marks a huge step forward in reducing compute times required for indexing — up to 10x faster indexing at comparable costs. With this release, Qdrant enables indexing within real-time applications like AI agents, dynamic content search, and long-term memory for generative AI systems. Available via Qdrant’s latest open-source version 1.13, the indexing capability is platform-agnostic and supports a wide range of GPUs, including NVIDIA, AMD, and integrated GPUs (Intel, Apple Silicon) for real-time index building.

In this conversation, André dives into Qdrant’s mission, today’s exciting new announcement around GPU-accelerated indexing, and how the company is empowering developers to build the future of AI.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with André 💬

André - introduce yourself and give us a bit of background on yourself and what led you to co-found Qdrant. What was your “ah ha” moment?

I'm André Zayarni, Co-founder and CEO at Qdrant. Qdrant came together in 2021 in Berlin after me and my co-founder Andrey Vasnetsov, CTO at Qdrant, worked on a project where we wanted to leverage vector similarity search to build a matching engine for unstructured data objects. However, after looking at feasible options, including libraries like FAISS, we were not able to find what we needed in terms of features and scalability, so Andrey decided to build his own vision of a production-ready vector search engine from scratch.

Once he had the first version, we published it on GitHub and saw an overwhelming amount of developers interested in this, showing that they had a similar need. The feedback and questions we started receiving from developers and other startups made us realize something was there. As the project grew, we decided to start Qdrant to continue working on building the vector search engine that we know today. And while we originally built Qdrant for use cases like similarity search and recommendations, we had another aha moment, with the onset of RAG applications and noticed how relevant Qdrant with our native design would be for GenAI applications.

How would you describe Qdrant to the uninitiated AI developer and/or enterprise experimenting with AI?

So Qdrant is, first and foremost, a vector search engine. While we often talk about vector databases in the industry, the main advantage of solutions like ours is to search through unstructured data for semantic meaning in the most efficient way. For Qdrant specifically, we wanted to build it for scale and high performance from the start, which was a key guiding concept for our architectural decisions, like building a vector search engine from scratch, leveraging Rust as our development language, but also how we designed our custom, filterable HNSW algorithm.

Thanks to this, Qdrant is able to handle billions of vectors and allows searching through them in near real-time, which is crucial for the production-ready AI applications we’re seeing today.

I am so inspired by Qdrant's performance. It is simply off the charts. How they do such a complex operation in under 100ms is exceptional.
I've never witnessed first hand a tool like it. Even db API requests take 300-400ms at a minimum.
— Charlie Greenman (@razroo_chief)
6:18 PM • Jan 17, 2025

Which customer segments are finding the most value in Qdrant’s products today? Are there any customer success stories that you’d like to highlight?

That’s a really good question, and I get this asked a lot. The honest answer is that we see strong adoption of Qdrant and vector search in a lot of different industries and company sizes, wherever you have a lot of unstructured data and the need to capture context or semantic meaning. We see AI native startups that quickly adopted Qdrant to build dedicated AI solutions around AI search, vertical RAG solutions - for example, for legal, finance, or coding assistants - all the way to recently using Qdrant as long-term memory for AI agents. But we’re also seeing larger companies like digital natives and enterprises like Bosch or Bayer who are using Qdrant. The range of use cases ranges quite a bit as mentioned earlier, from internal knowledge bases, to GenAI and chatbots, all the way to more traditional vector search use cases like advanced search, recommendation systems, or anomaly detection.

All in all, we're continually inspired by the innovative use cases our community develops and their valuable suggestions for enhancing Qdrant. With over 10 million downloads, the diversity of applications built with Qdrant is truly exciting. A few examples are Sprinklr who uses Qdrant to enhance their AI-driven customer experience solutions, Nyris who offers visual search solutions for manufacturing, or QA.tech who is building AI agents for web application testing.

What sets Qdrant apart from some of the other players in the market, also as it pertains to this latest announcement?

We have a few principles of how we are developing our product and this is also something we hear from our community as to why they value Qdrant. One aspect is that we're open-source. This matters because it lets developers tweak and expand our platform as they need. Being open means no vendor lock-in boosts trust in AI, and speeds up innovation as the community shapes it, keeping our tech on the cutting edge.

A second point is that we’re all for deployment flexibility. Developers should be able to run their vector search applications in the environment of their choice, be it self-managed, fully managed with our Qdrant Cloud for ease of use, or managed but in their own environment for data privacy, or latency reasons. We make this possible with our Qdrant Hybrid Cloud, and Private Cloud deployment options. But maybe we can go into detail here later.

The third part is through our feature set. While there are many components of the modern AI search stack (embedding models, vector database, reranking, etc.), we are laser focused on offering the best available solution for native vector search. This is why we invest a lot of our resources in R&D and we’ve come up with concepts like integrated quantization, pure-vector-based hybrid search, built-in quantization, the ability to offload data to disk for efficiency reasons, multi-tenancy support, and much more.

Our latest announcement - our GPU-accelerated vector indexing - is the next step.

Can you tell us more about what the GPU-accelerated vector indexing is about, why it matters, and what makes it unique?

Vector search is compute-intensive, mainly during index building. You start with unstructured data like PDFs, images, videos, or text, convert these documents into vectors using embedding models, and then load them into the vector database. Qdrant then automatically builds an ANN index, allowing you to search through your vectors quickly and find relevant results efficiently. However, since this index-building step is so compute-intensive, it can sometimes take minutes, hours, or even days, depending on the size of the data you’re indexing.

A few months back, we started to look into if we could leverage GPUs to speed up the index-building time, and this is basically what we’re launching today - Qdrant GPU-accelerated vector indexing. We have done initial tests and have seen exciting results of up to 10x speed up in the index build time when comparing it with the traditional CPU-powered method at comparable costs.

This is especially important when you have either large datasets - talking billions of vectors - or applications where you constantly want to index new data in real-time. These could be for example, real-time search to keep search results up-to-date and relevant for dynamic content environments, like social media. Another use case is AI agents, where near real-time indexing and re-indexing can ensure that AI agents deliver immediate, data-driven decisions in dynamic settings.

Now, the really unique part about our GPU-accelerated indexing is that it is platform-agnostic, so developers can leverage the GPU of their choice. As I mentioned earlier, we’re big on empowering developers to build and scale real-time AI applications flexibly and free from hardware vendor constraints, and this is what was important to us here as well that developers have the freedom to choose the fastest or most cost-effective GPU, or simply the one they already have available. Starting with our latest open-source version 1.13, Qdrant supports a wide range of GPUs, including NVIDIA, AMD, and integrated GPUs (Intel, Apple Silicon) for real-time index building.

You mentioned a number of different deployment types earlier - Managed Cloud, Hybrid Cloud and Private Cloud. Could you walk us through which of these excites you most?

We began by developing the Qdrant open-source version that you can deploy wherever you want and manage yourself. About two years ago we introduced our first fully-managed offering - Qdrant Cloud - which lets you deploy a Qdrant vector database on AWS, GCP, and Azure without worrying about the operations of it. It is a very easy way to get started and scale your applications effortlessly.

However, we have also received a lot of requests from our community for a deployment option that lets you run Qdrant in your own environment but you have the same benefits of a fully-managed deployment model like you have with Qdrant Cloud. The key reason for this was that our customers wanted to either leverage their existing infrastructure or cloud plans, have compliance or privacy policies that require them to run AI applications in their own infrastructure - we see this a lot in the enterprise - or the ability to reduce latency by running the vector database in the same environment as the rest of the tech stack.

Since we saw a growing need there - and again wanted to provide deployment flexibility for our users - we launched Qdrant Hybrid Cloud in April last year. It enables the deployment of managed Qdrant clusters across any cloud or on-premise infrastructure, maintaining data privacy by segregating data and control layers.

We’ve seen incredible feedback from our customers regarding Qdrant Hybrid Cloud as it allows unlocking new applications while maintaining full data sovereignty. Beyond that we are also offering a fully private deployment, called Private Cloud.

How does RAG evolve in a world in which foundation models are getting ever more capable? How are you thinking about evolving and/or maintaining your edge in the marketplace?

Great question! Foundation models have indeed become highly effective, but we're starting to see diminishing returns on performance gains with increased compute. This shift highlights a crucial aspect: the growing importance of context and internal information that isn't available for training foundational models or is too specific or sensitive to include. As AI applications, particularly AI agents, increasingly focus on real-time context and decision-making, the relevance of RAG becomes more apparent.

As I mentioned before - we are trying to build the best vector search offering with a rich feature set and this is what will ultimately make RAG applications perform in the way the developer intends to. For example, when it comes to agentic RAG, which powers AI Agents, we provide unique features that are particularly useful in this context. For example, multitenancy ensuring that multiple agents can collaborate in distributed systems, hybrid search which combines semantic vector search, lexical search, and metadata filtering, enabling AI Agents to retrieve highly relevant and contextually precise information, or the semantic cache enhancing AI agent efficiency by preserving results of queries based on semantic equivalence rather than exact matches.

Check out this step-by-step guide to combining LlamaIndex, @MLflow, @qdrant_engine, and @ollama:
🔍 Integrate LlamaIndex with Qdrant for efficient vector storage and search
🚀 Use MLflow for model tracking, packaging, and evaluation
🔄 Implement Change Data Capture for real-time… x.com/i/web/status/1…
— LlamaIndex 🦙 (@llama_index)
10:37 PM • Jan 6, 2025

Agents-in-production have become a very visible part of the AI narrative? How, if at all, do agents factor into your product vision for Qdrant long-term?

Ultimately, AI Agents are one of many applications for vector search engines and there will be many more in the future. So we are trying to see the entirety of use cases and rather look at which features can make the core of vector search more scalable, more efficient, more precise, and more real-time. This will ultimately benefit not just AI agents but other applications as well. Beyond this, AI Agents will likely be a very important segment for us and this is why we are working very closely with different AI Agent frameworks to ensure an easy integration. For example, Qdrant works out of the box with frameworks like CrewAI, LangGraph, Autogen, or Camel-AI.

So basically, we're focused on building foundational capabilities that will serve our users' needs whether they're building agent systems or other AI applications. That’s why I think it’s good to focus on what are the evolving requirements for a vector database. In my opinion, this can be broken into four categories: real-time, multimodality, agentic RAG, and data sovereignty.

Build your own AI-powered Customer Service Discord Bot with Local Models! 🚀
In this cookbook you will learn how to:
- Harness @ollama to run @Alibaba_Qwen's QwQ 32B-preview locally.
- Set up a @discord bot for customer service bot using 🐫 @CamelAIOrg, 🔥 @firecrawl_dev, and… x.com/i/web/status/1…
— CAMEL-AI.org (@CamelAIOrg)
5:13 PM • Jan 14, 2025

How do you see Qdrant’s products progressing over the next 12 months? Anything specific that your customers should be excited about?

We have a lot of exciting projects planned for 2025 but I don’t want to give too much away. On the managed offerings side we’re currently expanding our capabilities to further enable enterprises to leverage Qdrant across their organizations.

When it comes to our open-source project, we maintain a transparent roadmap and are also always thankful for feedback, suggestions, ideas, and contributions from the community to help shape our roadmap - because that’s ultimately how we prioritize our efforts. Beyond this, we’re constantly looking for additional ways to provide the best deployment flexibility for our developers. The best way to get in touch with us for this is by joining our Discord community of over 7.000 members or checking out our GitHub.

How would you describe the engineering culture at Qdrant? Are you hiring? What do you look for in prospective team members?

Our distributed team logs in from 21 different countries, including Germany, Brazil, Spain, United States, Poland, the Netherlands, and more. Async flexible work across many timezones is just how we operate. So we could all meet in person, we recently had an offsite in Gran Canaria, where everyone came together for brainstorming sessions, surfing, and stargazing.

To give a better sense of our culture, though, instead of me describing it, I asked a few of our engineers how they would. Here's what they had to say:

“The engineering team at Qdrant is all about teamwork and problem-solving—there’s always someone willing to step up and help out, no matter the issue.”

- Georgios Stefos, Cloud Features Team

“The engineering culture at Qdrant encourages me to own my work, explore the latest technologies, and continuously improve our systems while benefiting from a supportive and friendly remote environment where everyone is happy to help each other.”

- Arthur Koziel, Cloud Resilience Team

“The technology is cutting edge, and we have a great mix of transparency and accountability, which fosters personal growth down here on Earth while our service is reaching for the stars.”

- Dominic Page, Customer Support Team

As for hiring, we're always looking for talented individuals who thrive in a growth-oriented environment. Open roles are continuously posted on our website.

Conclusion

To stay up to date on the latest with Qdrant, you can:

Start free with a 1 GB RAM cluster, no credit card required https://cloud.qdrant.io/login
Contribute https://github.com/qdrant/qdrant/blob/master/docs/CONTRIBUTING.md
Subscribe to product updates https://qdrant.tech/subscribe/

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.