- Cerebral Valley
- Posts
- Timescale is making PostgreSQL better for AI ⚡️
Timescale is making PostgreSQL better for AI ⚡️
Plus: AI Product Lead Avthar Sewrathan on pgai and pgvectorscale...
CV Deep Dive
Today, we’re talking with Avthar Sewrathan, AI Product Lead at Timescale.
Timescale is on a mission to help developers use PostgreSQL for everything. The company is renowned for its innovative approach to enhancing PostgreSQL databases, having built several open-source extensions and a robust cloud product for developers building a wide variety of applications, from RAG and agents, to IoT and finance.
Avthar, who has been with Timescale since 2019, brings a wealth of experience from his background in product and developer relations, and as a venture-backed startup founder in the crypto and privacy space. He has been pivotal in driving the company’s innovations to help AI developers, including two newly released open-source extensions which make PostgreSQL a better database for AI applications, pgai and pgvectorscale.
Timescale has raised more than $180 million in funding from investors including Benchmark, New Enterprise Associates, Redpoint Ventures, Icon Ventures, Two Sigma Ventures, and Tiger Global, and serves thousands of customers globally.
In this conversation, Avthar shares insights into Timescale’s bet on PostgreSQL, their new open-source extensions for AI developers, and how their solutions are empowering developers to easily build sophisticated AI applications.
PGVECTOR IS NOW FASTER THAN PINECONE. And 75% cheaper thanks to a new open-source extension – introducing pgvectorscale.
🐘 What is pgvectorscale?
Pgvectorscale is an open-source PostgreSQL extension that builds on pgvector, enabling greater performance and scalability (keep… x.com/i/web/status/1…— Avthar (@avthars)
1:18 PM • Jun 11, 2024
Let’s dive in ⚡️
Read time: 8 mins
Our Chat with Avthar 💬
Avthar, welcome to Cerebral Valley! First off, tell us a bit about yourself and what led you to join Timescale.
Thanks for having me here. My name is Avthar, I’m from South Africa, and I'm the AI Product Lead at Timescale. Timescale is a PostgreSQL database company. We’ve built open-source PostgreSQL extensions and a cloud product that hosts and manages Postgres databases for demanding applications, from AI to IoT to finance, as well as general applications that use a relational database.
I've been in the developer products space for eight years now. Before Timescale, I was a startup founder in the crypto and privacy space, which gave me experience in building a useful product from the ground up, and wearing different hats from engineering, to product, to marketing and sales.
I joined Timescale in 2019 as an early employee. I've been with the company for five years now, taking on a number of roles in Developer Advocacy, Product and now AI. The main reason I joined was the founders, particularly Timescale co-founder and CTO Mike Freedman, who advised my research in the Computer Science department at Princeton University. He told me about the exciting things Timescale was doing, and the scope to make an impact. I was drawn to how strongly people loved the product, with developers using it for all kinds of use cases beyond the advertised “time-series”. That drew me to the company and that developer love has kept me here ever since.
How would you describe Timescale to the uninitiated developer interested in AI?
Timescale is a Postgres data company. We help developers use PostgreSQL to power AI, time series, and analytics applications. Timescale has open-source products which developers can use for free, like timescaledb, and now pgvectorscale and pgai. Developers can install those extensions on any PostgreSQL database and use their capabilities to turn Postgres from a general-purpose relational database, into a specialized vector database or time-series database. This removes the need to adopt a separate database and enables developers to simplify their data stack by just using PostgreSQL.
Timescale also has a managed database product, which provides a worry-free experience for running PostgreSQL in the cloud, taking care of things like backups, replicas, and security. For AI, Timescale Cloud offers a stack of open-source extensions including pgvector, the popular extension for vector data on PostgreSQL, as well as pgai and pgvectorscale. These give developers a variety of tools to easily build and scale RAG, search and agents applications.
PostgreSQL is the Swiss army knife of databases
— Avthar (@avthars)
1:59 PM • Apr 29, 2024
Who uses Timescale and why do they choose it?
We find that application developers, data engineers, and more recently, AI engineers are the primary users of Timescale. The main reason people choose our product is because of PostgreSQL. According to last year’s Stack Overflow Developer Survey, Postgres is the most loved database among developers, including professional developers.
Developers come to Timescale to use Postgres beyond the general relational use cases. Initially, Timescale was popular for time series and analytics. Now, it's increasingly also used for AI applications. Notably, Timescale excels at workloads that involve large-scale vector storage and retrieval, and as I’ll talk about later, we’ve built pgvectorscale to further help developers with that use case.
Let’s dive a little deeper into your two new offerings, pgai and pgvectorscale. Could you expand on why these are so important for the AI community?
We built pgai and pgvectorscale to make PostgreSQL better for AI applications. We focused on two key problems: improving performance for large-scale vector use cases with pgvectorscale, and making Postgres easier to build AI applications with, with pgai.
pgvectorscale is an open-source extension that builds on pgvector for enhanced performance and scale. And pgai is an open-source extension that brings embedding creation and LLM completions to the database, giving more Postgres developers the skills of AI Engineers. Both extensions complement pgvector, the popular open-source extension for vector handling in PostgreSQL, and rely on its capabilities.
For performance and scale, we wanted to challenge the notion that PostgreSQL and pgvector are not performant for vector workloads. To address this, we developed pgvectorscale, which enhances Postgres' performance for large-scale vector use cases. Pgvectorscale does this thanks to two key innovations:
StreamingDiskANN vector search index: A new high performance, cost-efficient search index for pgvector data. StreamingDiskANN overcomes limitations of in-memory indexes like HNSW (hierarchical navigable small worlds) by storing part of the index on disk, making it more cost-efficient to run and scale as vector workloads grow.
Statistical Binary Quantization: Developed by researchers at Timescale, this technique improves on standard binary quantization techniques by improving accuracy when using quantization to reduce the space needed for vector storage. Quantization is essentially an embedding compression algorithm that allows more vectors to be stored in less disk space, reducing storage costs and speeding up queries.
You can learn more about the technical details of pgvectorscale in this “how we built it” blog post.
To test the performance impact of pgvectorscale, we compared the performance of PostgreSQL with pgvector and pgvectorscale against Pinecone, widely regarded as the market leader for specialized vector databases. We go into detail about the benchmarking methodology and results in this pgvector vs. Pinecone comparison blog post. The TL;DR is that pgvectorscale helps PostgreSQL achieve better performance than specialized vector databases like Pinecone, meaning that performance and scale are no longer concerns when using Postgres for AI applications.
With pgai, our goal was to make it easier for Postgres developers to build AI applications by bringing AI models closer to the database. Pgai enables embedding models and generation models like GPT-4o to be accessed directly within the database via SQL queries.
This simplifies tasks like:
Embedding creation: Creating or updating embeddings for every row in a table of data with a simple SQL query, eliminating the need for a complex data pipeline.
LLM Reasoning on data inside PostgreSQL: Performing tasks like summarization, classification, and data enrichment with a SQL query and storing the results in the database or outputting them to users.
GIVING POSTGRESQL DEVELOPERS AI ENGINEERING SUPERPOWERS. Thanks to a new open-source extension – introducing pgai.
🐘 What is pgai?
Pgai is a PostgreSQL extension that brings more AI workflows to PostgreSQL, like embedding creation and model completion.Pgai makes AI… x.com/i/web/status/1…
— Avthar (@avthars)
4:10 PM • Jun 12, 2024
Currently, pgai supports OpenAI embedding and chat models, but we’re adding support for open-source models via Ollama next.
There are a number of teams focussing on open-source databases for AI applications. What sets Timescale apart from a developer's perspective?
Developers often face a choice between using a specialized database or a general-purpose database that also supports their needs. Timescale stands out because it enables developers to use PostgreSQL without giving up any performance, scale or ease of use benefits that come with specialized vector databases.
I’ve discussed how pgvectorscale and pgai solve for performance and ease of use above, but here are a few more key reasons why developers choose PostgreSQL for AI applications:
Existing Knowledge and Reliability: Many developers and their teams are already familiar with Postgres and might already deploy it as an application database. Postgres is known for its reliability and robustness.
Unified Data Storage: You can store multiple data types in the same database for AI applications. Vectors can sit alongside metadata and customer data, allowing for easy joins between them, removing the need to sync and duplicate data across multiple databases.
Tooling and Ecosystem: Postgres has a rich ecosystem of drivers, libraries, connectors and other tooling, making it easy to connect to everything else in a data stack. SQL is one of the first languages that data engineers and developers learn and PostgreSQL lets developers use SQL for search and RAG, with the full power of filters, groupings, views and more.
Simplified Data Architecture: Instead of managing separate databases for vectors, relational data, and analytics, you can just use PostgreSQL. This is especially important in the fast-changing AI market, where time spent wrangling multiple databases is time taken away from developing features and improving your product.
Timescale offers developers a way to easily run their databases in the cloud inheriting all the benefits of PostgreSQL outlined above. This combination of specialized performance and ease of use from extensions like pgvectorscale and pgai, combined with the familiarity, reliability and simplicity of PostgreSQL makes Timescale a great choice for developers building AI applications.
Are there any specific customer success stories that you'd like to share, whether it's with the AI offerings or in general?
There are two that come to mind.
First, there's a really cool company called OpenSauced. They are an insights platform that ingests millions of GitHub events about popular open-source projects like React, Kubernetes, and AI projects. They enable users to ask questions about open-source projects such as who the most active contributors are, how someone has contributed to a project, and how the project has grown. They use Timescale and pgvector for vector storage and similarity search in their AI agent application, called StarSearch. The agent can answer user questions about recent changes in a repository by synthesizing data from their database and results from web searches, which helps users learn more about the project and gain insight into its contributors.
StarSearch by OpenSauced, an AI agent powered by Timescale.
The second story I’ll highlight is Market Reader, a company building an AI-enabled financial information product. They help users understand why the market is moving. For example, if Nvidia's stock increases by 20%, they can summarize the key news that’s driving that change, such as partnerships, competitor moves, earnings announcements etc. Their product is powered by Timescale, and they use pgvector for their RAG functionality.
This is really exciting news from the @TimescaleDB team. We've been building @marketreaderinc on Timescale tech from the beginning, and this only makes our work in the AI/LLM space even more powerful! Congrats guys!
— Web Begole, CMT 🇺🇲🇺🇦 (@web_begole)
9:47 PM • Jun 11, 2024
Overall, we’re excited that more companies are building AI features into their products using PostgreSQL as their database.
Talk us through some of the technical challenges around Timescale’s development of pgvector and pgai.
For these two particular extensions, you need a deep level of database expertise. I'm fortunate to have a team that includes PhDs in computer science, like my engineering lead, Matvey Arye, who was one of the original builders of TimescaleDB. Some of the key challenges we faced included low-level integration of the StreamingDiskANN index into PostgreSQL and at a higher level, deciding when to build things from scratch versus when to build on top of existing work.
In the AI space, with new projects emerging rapidly, it's crucial to focus on what creates the most value for users. For instance, with pgai and pgvectorscale, we chose to build on top of pgvector, using its data type, search operators, and distance functions, rather than reimplement those. This allowed us to focus more on developing our own indexing algorithm and data structure for efficiently storing and searching vector data. Building on top of existing work is a feature of open-source software, and it’s clear how this allows teams like us to use engineering time more efficiently. Both pgai and pgvectorscale are open-source and we welcome contributions from anyone interested.
Another notable challenge we solved was improving filtered search. We implemented a streaming filtering solution, which enables developers to get high accuracy when performing vector search with secondary filters, which is very common in RAG use cases. Streaming filtering was something we heard complaints about in pgvectors’ HNSW algorithm.
Diagram of streaming filtering in pgvectorscale. Learn more here.
Tell us about Timescale's roadmap for the next 6-12 months. How do you see your product offerings evolving?
It's going to be very exciting. We've been working on these two extensions, pgai and pgvectorscale, for the past several months, and this is just the first step in our quest to make Postgres a better database for AI. I can’t go into too much detail, but at a high level, our roadmap includes:
Performance and Scale: Continuing to innovate on the performance and scale front with pgvectorscale, making it even better. We plan to further improve areas like index build times, filtered search, and quantization.
Embedding Creation and Updating: We want to make creating and updating embeddings for data even easier. This includes ensuring embeddings are updated as the underlying data changes, as well as making it easy to test different embedding models to see which one suits your use case best.
Text to SQL and Structured RAG: This is an area I'm super excited about. Many Timescale customers store a lot of structured data in their databases tables, like time-series, analytics, and event data and want to enable their users to ask questions about that data. Dashboards can only go so far. So we’re working on features that will allow developers to give their users the ability to get more insights from their data.
Lastly, how would you describe the culture at Timescale? Are you hiring?
Timescale's culture is intense but flexible. We really care about developers and providing a high-quality experience for them. We're a remote-first company with team members all over the world, from different continents—everywhere except Antarctica. The two words I'd use to describe our culture are intense and flexible.
It's intense because we have high standards for quality. Our products are critical, especially being a database company. If your database fails, it's a significant issue, so we pay careful attention to detail and care deeply about the craft of our software. At the same time, we're very flexible. We work asynchronously, allowing folks to do their best work at the times and places that suit them. We come together for off-sites, but most of our work is done remotely.
In terms of new team members, we’re definitely hiring. If you're interested in working on the challenges of making Postgres a better database for AI, please get in touch. If you don’t see an open job, reach out to me on Twitter or LinkedIn. We're hiring across engineering, product, and developer advocacy roles, particularly focused on AI.
For prospective hires, we look for people who can think deeply about customer problems and understand their needs. While there's often an emphasis on speed and experimentation, we value those who can balance that with careful consideration of what our customers need. This isn't just for product managers—many of our engineers have a product-focused mindset.
We want individuals who can think about what the right workflows look like and what developers and AI engineers would want in a solution. It's about having a craft-focused approach, paying attention to detail, and caring about solving the right problems for our customers. While technical elegance is important, it should serve the primary goal of addressing customer needs effectively.
Conclusion
To stay up to date on the latest with Timescale, follow them on X and learn more about them at Timescale.
Read our past few Deep Dives below:
If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.