• Cerebral Valley
  • Posts
  • Baseten is pushing the boundaries of AI Inference ⚡️

Baseten is pushing the boundaries of AI Inference ⚡️

Plus: CEO Tuhin on AI models, Truss and Baseten's Series B...

CV Deep Dive

Today, we’re talking with Tuhin Srivastava, CEO and Co-Founder of Baseten.

Baseten is an AI infrastructure platform that helps machine learning teams to run large ML models, whilst abstracting away many of the underlying processes required to run those models in a production environment. Founded in 2019 by co-founders Tuhin, Amir Haghighat, Philip Howes and Pankaj Gupta, the startup’s mission is ‘to be the most performant, scalable, and reliable way to run your ML workloads’ via its fast, reliable AI infrastructure platform currently focused on inference with version management, observability and orchestration.

Today, Baseten has thousands of machine learning teams using its tools for inference, including at companies like Descript, Picnic Health, Writer, Patreon, Loop, and Robust Intelligence, and the team is fielding strong demand from enterprises and small teams alike - all clamoring to incorporate Baseten into their own AI stack. The fast-growing startup just announced a $40m Series B led by IVP and Spark Capital, with participation from existing investors Greylock, South Park Commons, Lachy Groom, and Base Case.

In this conversation, Tuhin walks us through the founding premise of Baseten, why AI inference is the name of the game, and Baseten’s goals for the next 18 months.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Tuhin 💬

Tuhin - welcome to Cerebral Valley! Firstly, take us back to 2019 and tell us about what led you to start Baseten. 

Back in 2019, we had an overarching belief that ML and AI was going to be a huge thing. 

At that time, I don’t think anyone imagined things would progress as quickly as they have in the past few years. That said, at Baseten, we spent the better part of a decade building ml-powered products, and realized that even once you have a model, it takes a ton of time to actually figure out how to get it running efficiently in production environments. Then, as models grow larger and have bigger underlying compute needs, it gets even more challenging. We wanted to build something we wish we had access to when deploying and building products, and so we started Baseten

How would you describe Baseten to an ML team who might be unfamiliar with the product? 

Baseten is an inference product. We allow you to bring custom or open source models, deploy them, and get them served. There are three things we try to provide for all our customers: a delightful developer experience, high performance, and reliability and scalability. What that means is that we're enabling our customers to run models very quickly - that might be with maximum GPU utilization, or a lot of tokens per second, or minimal latency. That’s what we’re all about

Who are your users today? Who’s finding the most value in using Baseten?

Our customers are mainly engineers or engineering teams running large models, across small and large companies. In the beginning, we worked with digital-native and AI-native companies - we actually didn’t target enterprises at first. The common thread is that these are folks who care about their user experience and may have some enterprise-like requirements for performance, reliability, and security, but our product is pretty applicable to teams as small as one, all the way up to tens of thousands.

You’ve been developing tools for ML teams since 2019. How did the release of Midjourney and ChatGPT shape your trajectory, especially in the context of inference?  

A couple of things changed immediately. Firstly, when ChatGPT came out, it created a new ‘contract’ for users - overnight, people expected AI systems to give responses in real-time. When Stable Diffusion came out, that was probably an even bigger moment than Midjourney, as it was an open source model. All of a sudden, this ecosystem formed around them

Prior to those moments, we were actually doing a lot more than inference, as we had taken for granted that inference was hard. In the aftermath of Stable Diffusion and GPT-4 in 2023, we started focusing more heavily on inference. Obviously, if you look at the market now, inference is the name of the game.

You offer a suite of model serving tools and optimizations - what areas of focus are you prioritizing internally at Baseten, especially considering how important inference has become?

We’re involved in model deployment, observability and a lot more - we really focus on a whole set of workflow tools ranging from version management to observability and orchestration. We’re also really focused on performance - we put in a lot of work with TensorRT-LLM from NVIDIA, for example, to make sure language models are running as fast as they can on our tools. Lastly, we focus heavily on infrastructure - with Baseten, you can deploy your model on any cloud and then it can spill over to our cloud. Customers will come to us asking to run models on AWS, GCP, or even a combination of the two - but with us, they can run it all in one place.

In terms of our partnership with AWS and GCP, one of the big hassles we see customers experiencing today is acquiring compute. We’re trying to make it so that if you have compute, you can run it wherever you want, and if you haven’t acquired compute, you can run it with us. Obviously, having access to these things at great prices in the regions you care about is a massive deal, and so we are prioritizing that

How do you decide what to prioritize and build internally, given the pace of AI research and advancements in both software and hardware? 

A lot of our focus is driven by our interactions with our customers.We dedicate a huge amount of time to observing our customers and developing empathy for their needs from an ML infrastructure standpoint. It also comes down to what we wish existed in our own workflows.  

One of the big patterns we see is that the market tends to be too reactive, and we work really hard internally to stay focused and not overreact. We know the market will fluctuate - if it goes one way today, it's probably going to go the other way tomorrow. We’re focused on building on long arcs of time. On the hardware side, for example, we wanted to partner with someone like NVIDIA where we know we have defensibility with a larger partner who will constantly be pushing the state of the art. 

You've emphasized the importance of inference, but also hinted at expanding into other areas like fine-tuning, training, and evaluations. Can you provide any insights into your thinking on this?

The next two things I can tease are orchestration and multi-cluster, and fine tuning will come down the line. We know that fine-tuning is going to be a big market , and we’re trying to figure out what the right abstraction build would look like with customers. It’s pretty hard to fine-tune today, so it’s more of a research focus than a product focus for us right now. Partnering with the right provider to start with is probably going to be the way we go, and eventually, we'll get into training.

Overall, we believe that inference is currently the most important piece to focus on, because it’s the stickiest. If you run the models, you'll have a really good sense over time of the one layer of abstraction that people are building on top of it, and there are a ton of interesting things that come from being intertwined with that. 

What’s the most challenging technical aspect of building highly-performant inference?

I don't think there’s one thing. Twelve months ago, people would say that the biggest challenge was the cold-start problem, and then we got cold starts to under 10 seconds - so the problem went away.  It’s hard to say that there’s just one thing that’s difficult - these systems are dynamic and a number of things change every day. 

People also say performance is the hardest thing, but I actually think performance is just one of many challenges. Building resilient infrastructure is also very difficult, and so is cost optimization. That said, Baseten has grown inference loads over 200x in the last twelve months, without a minute of downtime.

Baseten recently launched Truss, which is open source. Do you have a position on the debate between open-source and closed-source? 

The truth is, we think the future of AI is a combination of both open and closed source. We see that with customers right now, where they're taking proprietary models and off-the-shelf models and combining them in interesting ways. This is how we expect the ecosystem to evolve. At the end of the day, if there's only one provider of models, that’s definitely a little scary to me. Looking at the work that Mistral and Stability have done with open source models, there's a lot of value hidden away there.

You recently announced a significant Series B - any specific areas of focus for the next 12 to 18 months?

We’re really focused on continuing to build our product advantage. We think we have the fastest and most performant AI infrastructure product today, and our team is focused on doubling down on that. We have a pretty tiny team, including 2-3 go-to-market people, and so we want to be very product-focused but also aggressive in the market. I’d say the most important thing is continuing to build great stuff and talking about it. We’re working to stay rooted in bringing value to our customers and providing them with the best developer experience and product we can.

Lastly, tell us about the team culture at Baseten. What do you look for in prospective team members, and are you hiring?

I’d say we prioritize people who are smart, autonomous, and humble. We don’t like egos and we don’t take ourselves too seriously outside of work. Our culture is extremely autonomous - we like to give people big projects they’re proud of being a part of and can run with!

Conclusion

To stay up to date on the latest with Baseten, follow them on X(@basetenco) and learn more at Baseten.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.