- Cerebral Valley
- Posts
- Portkey is the control panel for your AI apps 🎛
Portkey is the control panel for your AI apps 🎛
Plus: Founder Rohit explains the hardest thing about AI observability…
CV Deep Dive
Today, we’re talking to Rohit Agarwal, co-founder and CEO of Portkey AI.
Portkey is a startup building developer-focussed tools around AI observability, gateways and prompt management. Their goal is to be the ‘control panel’ for AI developers building applications on any of the hundreds of open and closed AI models, by providing an abstraction layer on top of APIs that enables development, deployment and management of LLM-based systems. Founded in early 2023, Portkey is the brainchild of Rohit and his co-founder, Ayush Garg.
At @PortkeyAI we are the default AI gateway for ~10M LLM Requests everyday across @OpenAI@Azure@anyscalecompute and other LLM providers.
We track the API success rates of various providers internally.
Sharing a glimpse that OpenAI has been seeing problems in the past 2 days.… twitter.com/i/web/status/1…
— Rohit Agarwal (@jumbld)
4:09 AM • Nov 21, 2023
Today, Portkey has 1,000+ organizations incorporating their suite of tools within their AI/ML stacks, including Postman, Springworks and several Fortune 500s - all looking to understand and optimize their stack for accuracy, low-latency and reduced cost. In August 2023, the startup announced a $3m seed round led by Lightspeed, which has enabled them to expand their capabilities from a gateway and observability suite and towards prompt management.
In this conversation, Rohit walks us through his vision for Portkey, building sophisticated LLM observability systems, and his goals for 2024.
Let’s dive in ⚡️
Read time: 8 mins
Our Chat with Rohit 💬
Rohit - welcome to Cerebral Valley. Firstly, walk us through your background and what led you to found Portkey?
Hi there! My co-founder Ayush and I have been working on generative AI applications for the past 4 years. Previously, I led Product at Freshworks, a CRM software company with a flagship customer support product - and our main focus was building bots that had high accuracy and low latency. While there, I also witnessed OpenAI’s journey from GPT-2 to GPT-3, which was very exciting.
After Freshworks, I joined Pepper, an AI content generation startup similar to Jasper and Copy AI, that quickly scaled to a million users and billions of generations for enterprises. It was at Pepper that I realized the DevOps layer for generative AI wasn’t mature yet, and that traditional engineering tools weren't able to handle the challenges that GenAI posed.
At the time, most workflows were solely deterministic, but now, advancements in generative AI have made workflows a lot more probabilistic - so an API returning a success or an error is no longer the barometer for observability. Ayush and I started thinking about a platform layer that could enable Gen AI apps to be built and launched much faster.
Developer-wise, you had LangChain, LlamaIndex and other tools released to help developers adopt AI; however, we thought that there needed to be a platform that enables developers to take their PoCs to production faster, with adoption as one key element and production readiness as another. That's how Portkey was born.
Excited to announce ⚡️ Portkey.ai ⚡️ - A foundational model ops platform to help companies ship gen AI apps & features with confidence!
💡 Monitor usage, latency & costs
💼 Manage models with ease
🔒 Protect your user data#FMOps#LLMops
w/@ayushgarg_xyz
— Rohit Agarwal (@jumbld)
1:48 PM • Apr 19, 2023
Our goal is to be the AI control panel that developers can use to quickly add our AI gateway, which connects them to any model. We’ve done a lot of the heavy lifting around routing transformations and making sure APIs are forward-compatible - combined with a strong observability suite to measure performance, accuracy and cost. We started Portkey in April 2023 and were fortunate to raise from Lightspeed, which accelerated our growth.
Today, we're working with some of the most forward-looking AI companies in the world, including Postman, Quizziz, MultiOn, and Springworks. These companies are at the frontier of generative AI, and they trust Portkey to manage their LLM API routing as well as the observability on top of it.
How would you describe Portkey to a developer who's new to what you’re building?
We are the best control panel for your generative AI applications. Our value proposition is the following: if you have a generative AI app or feature that you're taking from PoC to production, you need to be able to manage everything that's going on within and iteratively make changes. That's what Portkey is all about.
With Portkey, you have access to an AI gateway as well as an observability platform, and we’re also working on creating a very capable prompt management system. We’ve had a prompt library for a while, but we've evolved this over the last six months by making it multimodal, having it support multiple models, and adding in features such as virtual keys and security guardrails - which gives you an experience we call ‘GitHub for prompts’. This feature fits in very well with our gateway and observability capabilities, because now you also have a space to continue iterating on input prompts and guardrails to determine whether your application is functioning effectively.
Could you elaborate on the phrase “GitHub for prompts”?
Today, a lot of companies end up writing prompts in code, and they’ll have code that's bringing in multiple pieces of the prompt, compiling the template, inputting the user variables, and then making the request to the API. As you’re building, you often realize you need to add a new instruction, and you’ll have to go back to code, change your prompt a little bit, test it in development, go to staging, and then go to production - and this entire cycle takes really long. What GitHub enabled was the ability for multiple developers to push code and do deployments very quickly, and this is the same approach we’re taking to prompts.
Within Portkey, you can have multiple people collaborating over prompt templates and then taking the best one to production right from within the app. You write your prompt templates in one single place, and then the minute you deploy, we'll give you an ID and an API which you can embed in your application. You can now update your prompt, test it, and deploy, and it will automatically get deployed without you having touch code ever again.
's gateway is almost exactly what we've been looking for, for a while - multi-provider routing and management, open-source and self-hostable.
We've had to build our own with a different feature set, but this looks like a wonderful tool!
— Hrishi (@hrishioa)
1:42 AM • Jan 10, 2024
Who are your users today? Who’s finding the most value in using Portkey?
Today, our users are mainly ML architects at midsize enterprise companies looking to understand and optimize their AI stack better. We’re also used by a solid number of AI startups, and the sweet spot for us tends to be companies that have a successful PoC and are looking to get it to production, monitor it, and quickly release changes with confidence. This is where Portkey provides a huge amount of value in the ML stack.
We've had over a thousand organizations sign up to Portkey, all on a typical maturity curve. All through 2023, enterprises were making noise about generative AI and creating PoCs internally to see if they could derive real value from this new technology. Towards the end of last year, a lot of enterprises realized that AI is going to be immensely valuable to their business and that they need to begin productionizing it in their own stack.
In the last two months, we've seen many large companies start to adopt tools like Portkey within their own production workflows.
There are a number of teams working on AI observability. What do you think sets Portkey apart, technically or otherwise?
We focus heavily on production use-cases, and this impacts the way we think about observability and our AI gateway, which is different from the typical debugging use-case. When you're building your application, you need visibility into its behavior, and so there's a lot of debugging that usually takes place internally. Portkey does touch on that aspect during development, but our value increases multifold when you go to production, as we're the gateway that's able to handle your scale really well.
We open-sourced our gateway in January, and that actually generated a lot of attention for us as it was extremely miniscule - only 48 kb when deployed, with the ability to handle 20,000 requests per second without breaking a sweat. That's what production companies really want - a fully-featured gateway that can also handle scale really well. So, that's obviously one major USP for us, which is production readiness that’s extremely scalable.
On the observability side, our huge advantage is the depth of observability that we provide. In Portkey, it's not just the typical cost per token across 150 models that have different pricing models, and all of these configurations that we have to make, but also around guardrails and accuracy - meaning real-time data around how your app is performing. We want to ask: “is it improving, is it degrading, which sets of users don't like our AI features, and what can we do to improve it?”
So, it's our entire stack of the gateway plus observability, plus prompt management guardrails, that makes Portkey the foremost platform for an AI engineer in a mid-market or enterprise company.
"Great product guys, absolutely love it, and the simplicity of just using already known JSON object and parameters just streamlines the whole experience man! You guys seriously just made it to damn easy!"
- actual quote from Alex Holmberg 🫶
f̶r̶o̶m̶ o̶p̶e̶n̶a̶i̶ i̶m̶p̶o̶r̶t̶… twitter.com/i/web/status/1…
— Portkey (@PortkeyAI)
10:57 AM • Feb 7, 2024
Do you find your users are gravitating towards specific models for their use-case, or are you seeing convergence towards the more popular models (GPT-4, Claude etc.)?
Generally, our users start with an OpenAI or a Claude model because they are so accessible and have proven to work extremely well for a wide variety of use-cases. Over time, though, we do see that developers want to optimize for a reduction in latency and cost and improve performance significantly - and that's where open source comes into play. Users will pick two or three models for different use cases and deploy them in a controlled fashion.
For example, we have this very interesting feature called Load Balancing, where you can send 95% of your request to OpenAI, but use Mistral for 5% of your traffic. That's a really compelling experimentation suite that we offer, which companies are finding valuable in helping them adopt open-source or a fine-tuned model, as they’re able to test alternatives quickly and go to production with them. We have configurations that allow you to do this within the gateway itself, where you can route to specific models for specific use cases. We also have metadata that comes in with each request, so you can have each request routed to different models.
That said, while we do enable this, we actually don't see a lot of users starting off using dynamic routing. We find that most enterprises experiment with different models in the pre-production phase - seeing which models work better than others. But, once they’ve zeroed in on a specific model, they’ll stick with that and slowly become better at using it, instead of dynamically routing requests to various models. This may be because there's already so much variability in these models that users don't want a further probability factor to be at play.
We’ve explored dynamic routing, but learned from our customers that routing to the cheapest or fastest model didn’t really work well in production. The concept sounds really good in theory, and we also loved that story back in October 2023 - but, having spent time with our customers, we’ve noticed a repeated pattern: companies are comfortable with variance and unpredictability initially, but then tend to settle down on use-cases and say “this model works for me, and I'll just go from here”.
For my projects I am increasingly switching between LLM providers. Sometimes based on token count, sometimes based on cost or availability. APIs that have the same signature (thanks @anyscalecompute ) are super handy and this makes it even more seamless!
— Julian von der Goltz (@jhvdgoltz)
5:39 PM • Jan 8, 2024
How does Portkey balance research vs. productization, especially with the pace of AI breakthroughs?
We find that we have to continuously update our product in order to accommodate the latest research, and keep up with everything that's happening in the world of AI. We don't have a specific AI research team, but what we do is more product-focussed research - for example, studying the models that are successful in various contexts and asking ourselves “where are these successful the most? What kind of prompting methods are prevalent? How are people composing these prompts and then getting them into production?”
Those methodologies are more so what we’re closely monitoring, rather than a focus on core AI research. We're not a company that will create our own foundational model, but we'll definitely fine-tune specific eval models that are extremely low latency and low cost.
How do you anticipate incorporating multimodal AI into Portkey’s product?
In a lot of ways, the process is similar, but we've had to do updates to the entire system to also be able to support multiple modalities. For example, the way images-plus-text work is different to the way audio works - the core product still stays the same, but we are now able to route prompts to multiple vision models or audio models and combine them with text. You can also upload an image as part of your template, or accept variables that are images, and in the logs we'll actually show you the image that was generated by you or the user. Portkey makes it easy for you to view all of these different elements.
When it comes to multi-modal, the key idea is to make Portkey a really fast and efficient production experience for developers across all modalities. Our goal is for multi-modal to become an extension of what developers are already doing with text, and show that upgrading to using vision is just another way to express ideas and write code.
Been blown away with the support for our open source LLM gateway!
thought I'd post it here and leave for the day, but grateful for all the feedback and questions which came our way!
gives the TS and OSS community more power
— Rohit Agarwal (@jumbld)
10:08 AM • Jan 9, 2024
What have some of the main technical challenges been with building Portkey?
I’d say there are two main challenges. Firstly, just keeping up with all that’s happening in the world of AI. A new model will come up, or pricing will change, and everybody wants to try the latest and greatest framework. So - how do you keep supporting all of this and making sure nothing breaks in the process? That’s why having our processes streamlined and making sure we're continuously staying ahead of the curve has been important, but obviously very challenging.
For example, we have a pricing configuration file that is extremely large as there are just so many configurations of pricing that can happen. Eventually, we want to abstract this away so that our customers don’t need to do it again and again, and you have the optionality to choose whichever pricing you want.
Secondly, a bigger technical challenge for us lies in optimizing latency. The number one question everybody asks us is “if I use your gateway or if I start using guardrails online, will it increase latency?” This has been something that we've been maniacal about clarifying - no, Portkey will not increase latency, and in cases we even decrease latency significantly. This was a big technical challenge - how do you build something that's extremely thin in an architecture that does not delay our calls at all?
We've done a lot of work around adaptive streaming and deploying on the edge - in fact, we had a use-case that showed that if you're comparing making a call from Sydney to OpenAI to including Portkey as the router in the middle, the Portkey call ends up going faster because of the edge server, which chooses the fastest route to ping the OpenAI servers and then come back. And that's been really compelling for us to see.
What are your goals for Portkey over the next 6-12 months?
This year is going to be about depth for us. We've chosen our lane by saying we want to be the best in observability - with the most performant gateway, the best prompt management and the best guardrail system out there. So, this year, it's just going to be about going deeper into achieving this goal and incorporating AI into those pieces as well. For example, Portkey already notifies you when a prompt is not performing, but can we make it so that it then gives you suggestions on how to make it better, or automatically improve your prompts for you?
Secondly, I’d also say that agents are a really big area of research right now, and companies are starting to adopt agents more broadly. Our challenge there is figuring out how agents go to production, and how we support that as well across these three layers that we've built in Portkey. With agents, the multi-step process does make things a little more difficult - for example, we still have to understand how to insert guardrails in, where to do evaluations, and whether to support multiple models in the same process. This means the entire workflow changes, and because there are so many agent implementations, we’ve also got to be able to support all of them. That's been the core challenge so far.
That said, we're seeing some really interesting agent companies come up. Obviously, MultiOn is one such company that's creating a browser-based agent and doing really well. But, we’re also seeing a lot more companies that are now experimenting with agents not to make a completion or to answer a question, but to complete a goal. And I feel that’s a really interesting future that’s getting closer by the day.
This is a great ship.
If you’re building AI apps, it’s easy to save money and ship a better UX.
Cache hits are good for you and for the user.
— Drew Bredvick (@DBredvick)
11:07 PM • Jan 26, 2024
Lastly, tell us about the team culture at Portkey. What do you look for in prospective team members, and are you hiring?
We're currently nine people right now, and I’d say there are a few things we're doing differently. One, we're trying to keep the team extremely lean. We handle over 2 billion tokens every day, the scale of our operations is insane, but we still have a really tiny engineering team. We also have a very capable developer-relations team that's reaching out to folks and making sure that they know about solutions like Portkey, or even educating our customers on the best way to go to production. LLMS in Production is our community, which we take a lot of effort in curating as well.
The other important piece is that we're working to be a very engineering-led organization. There are hardly any product analysts or product managers at this point in time, and we’re focused on engineers taking the lead on building this foundational platform. Almost everybody on the team comes from an engineering background, and we've all written code at some point in our lives whether we’re still doing so or not. We hire very occasionally - right now, we're definitely hiring for a technical writer and like a solid DevOps engineer as we start working with enterprises. But outside of that, we hire very specifically.
Live Now: LLMs in Production Event from SF! 🌟
We're kicking off an insightful evening exploring production-specific questions and issues on scaling LLM apps!
Link to join the livestream in the next tweet! ↓
feat. @lightspeedvp@databricks@llama_index@getpostman @yi_ding… twitter.com/i/web/status/1…
— Portkey (@PortkeyAI)
2:36 AM • Feb 1, 2024
Conclusion
To stay up to date on the latest with Portkey, follow them on X(@PortkeyAI), join their Discord community and sign up at Portkey.ai.
Read our past few Deep Dives below:
If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.