Cerebral Valley
Posts
OpenPipe fine-tunes your faster, cheaper, better model 💸

OpenPipe fine-tunes your faster, cheaper, better model 💸

Plus: CEO Kyle Corbitt on fine-tuning, AI Engineers and YC...

March 29, 2024

CV Deep Dive

Today, we’re talking with Kyle Corbitt, Co-founder and CEO of OpenPipe.

OpenPipe is a fully-managed fine-tuning platform for developers. The startup aims to replace your GPT-4 prompts with fine-tuned models, with the goal of lowering latency and cost, and improving quality within your AI workflows. OpenPipe offers a fully managed workflow that includes a whole host of services, including data collection, data refinement, fine-tuning, evals and monitoring.

Today, OpenPipe has thousands of AI Engineers using its platform for fine-tuning, including at established companies like Rakuten. Just this week, the startup announced a $6.7m Seed round led by Costanoa Ventures with participation from Y Combinator, Logan Kilpatrick (former head of DevRel at OpenAI), Alex Graveley (creator of GitHub Copilot), Tom Preston-Werner (founder of GitHub), Flo Crivello (founder of Lindy), Immad Akhund (founder of Mercury), Goodwater, and many others.

Thrilled to announce our $6.7M seed round to help you replace GPT-4 with your own fine-tuned models!
Round was led by @costanoavc alongside @ycombinator, @OfficialLoganK, @alexgraveley, @Altimor, @mojombo, @immad, @Austen and many others. ❤️
— Kyle Corbitt (@corbtt)
2:15 PM • Mar 26, 2024

In this conversation, Kyle walks us through the founding premise of OpenPipe, why fine-tuning is the future of the developer AI stack, and OpenPipe’s goals for the next 12 months.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Kyle 💬

Kyle - welcome to Cerebral Valley. First off, just give us a bit about your background and what led you to start OpenPipe?

Hey there! I’m Kyle, Co-Founder and CEO of OpenPipe. I graduated in CS and have always been excited about machine learning—in fact, I almost entered the field right out of college.

The one thing I was more excited about than ML though was starting a company, and when I graduated in 2013, there wasn’t a place for a new AI-focused startup to emerge - deep learning was incredibly promising, but you needed a ton of data and a successful product before you could apply it. So, I founded a startup in a completely unrelated field, before eventually joining Google. In 2017, I left Google to join YC and ended up leading Startup School, managing YC’s external-facing programs and content produced for non-YC founders. I ended up working there for 4 years, but eventually jumped into the world of startups again at the start of the generative AI wave.

In March 2023, my brother David and I started exploring startup ideas together. This was around the launch of GPT-4. It became clear to us that GPT-4 would let new AI-first companies build from 0 to 1. Specifically, you no longer need to have millions of labeled datapoints from existing users - now, you can start from just an idea and get to a prototype super fast, and that's a huge change in the way the world works.

That said, getting from 1 to 100 still felt much more difficult - we looked at the unit economics of scaling existing GenAI products, and it was clear that this was a blocker for many great ideas. We immediately started exploring the area of fine-tuning on smaller open-source models like FLAN-T5 to see if we could get from 1 to 100. Once we validated that smaller models could match large ones for many use cases and that this was a problem that real developers faced, we jumped in with both feet.

With OpenPipe, we're trying to get you from a prompted model that’s great for an MVP, to a production model fine-tuned on your use case. This lets you decrease latency, cut costs, and increase accuracy. All of those things are achievable through fine tuning, and OpenPipe just makes that process really smooth and simple for AI engineers.

We just officially launched as a YC company, and announced our fine-tuning functionality!
With OpenPipe fine-tuning you can automatically convert your expensive LLM prompt into a cheap, fast fine-tuned model.
Check out more details in our launch at
— OpenPipe (@OpenPipeAI)
3:31 PM • Aug 28, 2023

How would you describe OpenPipe to a new AI engineer or ML researcher?

With OpenPipe, we take you to the next step after your MVP. Say you spin up a prototype and you're getting a few tens of thousands of requests a day - you can use our SDK, which just wraps OpenAI's SDK and transparently passes everything through to OpenAI the way things were before. We make it super easy to collect all of the prompts and responses you’re using with a model.

The difference is, as you’re sending those requests, you're getting responses back, and actually creating a dataset that you can use to fine-tune in the future. All of those requests and responses get uploaded to your OpenPipe account, and you end up with a big list of those requests and responses, and you can then use that raw input data to create a model. Once you have that, you can filter it down - for example, if there’s a request that failed, you can add a tag using our SDK to say ‘this one succeeded and this one failed, and I only want to train my model on the responses that succeeded’.

We also have a lot of semi-automated filters built into our product. You can say ‘I want to take these 10,000 examples that I collected from real-world usage, and pass them through GPT-4’. For example, say you want a summarization prompt that should always return results as valid Markdown in bullet-point form. If you give those instructions to GPT-4 maybe 5% of the time it will return something other than bullet points. You want to eliminate those examples from your dataset.

So, we have a very clean flow in our system where you can go in and filter that down to say “here are my 10,000 request logs - can you filter out all of the ones that don't match this criteria” and it'll do that for you. Now, you have a much cleaner data set which is ultimately going to give you a higher quality model that does what you want.

This is the flow everybody operating at scale is finding independently (but if you're not at scale probably easier to stick with OpenAI for simplicity for now)
— Kyle Corbitt (@corbtt)
1:40 AM • Oct 26, 2023

Who are your users today? Who’s finding the most value in what you’re building with OpenPipe?

We definitely have a mix of ML developers, AI engineers and AI researchers. For example, we have ML PhDs from top institutions using us because, even though they could do all of these steps on their own, it's faster for them to outsource and use a really efficient product like OpenPipe. We also focus a lot on developer experience and making it literally 3 clicks to get from your dataset to a fine-tuned model.

The profile that is most common for us, is what we now call an ‘AI engineer’. This is someone who understands prompting and the problem they're trying to solve, but isn’t super deep on the hardcore ML stuff. They have a general idea of how a fine-tuned model can benefit them, but they don't have a lot of experience actually training models themselves. We've intentionally designed our platform for that user, to make it very easy for them to be successful.

We also have a lot of built-in heuristics we’ve developed through our internal tests, to optimize all the training hyperparameters for our customers. By default, you don't even have to set the number of epochs or the learning rate - we have really strong general heuristics for all of that. We also have analyses we run on your dataset as you’re starting a training run, which happens in the background, so that we can give you the best quality model without you having to be an expert or build that intuition yourself.

Have there been any use-cases for OpenPipe that have surprised or excited you, that you didn't previously anticipate?

When we launched, we wanted to be really good at structured data - so classification and information extraction in a very specific format. We thought we could do a really good job here because, while these smaller models might not be as capable as GPT-4, they’re powerful enough for bounded tasks where it’s just about understanding the instructions and extracting information from them. Evals also become a lot easier in this case, because you can do direct comparisons with your test-set. So, that's where we started and what we were focused on initially.

We immediately started getting strong demand for a broader range of GenAI use-cases, from summarization and QA, to chatbot support, and we’ve since pivoted to becoming more general. I will say that unstructured text production is not what we thought these smaller models would be good at, and the most surprising thing has been how great Llama 7B or Mistral 7B are across use cases that GPT 3.5 wouldn’t be optimal for. If you're willing to do fine-tuning, then even these smaller models - smaller than GPT-3.5 - are able to do a really good job. That's been my biggest surprise.

And instead of OpenAI... you can fine-tune open-source models
Already exists via API: openpipe.ai, or as open source SDK github.com/openpipe/openp…
Impressive stuff @corbtt @DavidCorbitt9 (cost figure below)
— Anton Osika (@antonosika)
3:20 PM • Sep 6, 2023

How does your team balance the need for AI research with productization, given how fast AI breakthroughs are taking place in 2024?

If I were to average the effort of our engineering team, 70% is on the productization side, and 30% is deep on the research side. That feels like the right balance, for now. Honestly, I think the productization side is where there's the biggest gap in the ecosystem. There are all of these fantastic techniques that are out there and well explained, and maybe exist in transformers or some other library, but there’s not a good way to use them. So, I see our role as less about doing our own fundamental research - although I think that will increase over time - and more about exposing those techniques to our customers. We see this as a very unsolved problem.

In terms of fine-tuning, what do you think differentiates your approach relative to other teams that are focused on this specific area?

The space of fine-tuning has been really interesting to us. There are online services where you can bring your own dataset and do a fine-tune, but I haven't seen anyone who really helps you with the full journey, and personally, we think that's the biggest opportunity. We're a full-stack experience, all the way from dataset preparation, to fine-tuning and serving the final models. As a result, we do have to worry about the uptime and other elements like that - frankly, when we were in YC and working with other investors, we got a lot of advice to focus on one thing and just stick to that.

In our case, though, doing the whole stack was necessary because otherwise there are too many pieces that a user has to put together on their own. For example, we have a very well-developed eval suite, so you can compare your fine-tuned model to the prompts you were using before. Again, folks told us “don't try and build that all in one product”, but I think we’ve benefited from that because those are all pieces that an engineer needs to successfully deploy a fine-tuned model. If we don't give it to them, they're not going to be successful - it's just too hard to connect those pieces.

To answer your question of “what are other people not doing?”, I think it's exactly that - I don't see anyone else who's even trying.

Curious about Llama 2? Here's a fun feature we shipped last week: automatically convert your GPT-3.5 prompt to the Llama 2 format with best practices!
Play with Llama 2, Claude 2, and GPT models at openpipe.ai 😉
— OpenPipe (@OpenPipeAI)
5:16 PM • Aug 5, 2023

Given your new fundraise, which areas are you going to focus on over the next 6-12 months? What are the biggest priorities that lie ahead?

We have a really strong product roadmap that I'm extremely excited about. Firstly, I think that we can improve our model quality - for example, decreasing errors by another 50% just by being a little bit smarter with the ways that we do filtering and labeling on the data. It’s actually a little shocking to me that more people aren't working on that problem.

The second important piece is helping educate the community on what they can gain from fine-tuning. Our conversion rate from a customer uploading their dataset, all the way to using us in production, is incredibly high. We'll often onboard a user and the common response will be “whoa, I didn’t think that this smaller model could do well on my task”. So, educating users on the fact that we do the heavy lifting for them so they save money and get lower latency, is really important to us.

Overall, I’d say user education, getting our name out there and explaining the benefits to developers is the huge focus for us.

Could you give us some insight into just how big of an effect OpenPipe has on latency, cost, and quality?

At this point, most of our new users are transitioning off GPT-4 or GPT-4-turbo, and coming from those, you're going to end up with about a 14x reduction in cost, and a 5x reduction in your average latency.

One piece of feedback we've recently gotten from a couple of customers is they’re frustrated they didn't find us earlier, because they’ve almost built their entire flow around compromising for latency. They have to implement streaming and have UX elements like 3 blinking dots just to make the latency not feel bad! With OpenPipe, you can now switch to a solution that has much lower latency and requires much less engineering effort too, because you don't have to design around that problem anymore.

As far as quality improvement, measuring this is very task-dependent. In most cases, you're able to get similar performance to what you would get with GPT-4, and sometimes it's a lot better - especially if you have a way to pre-process your own data and you're not training directly on GPT-4’s outputs, but you have some secondary process to distinguish between good and bad responses. Otherwise, from a quality perspective, it's going to be similar to GPT-4.

How to build a sustainable AI moat from 0:
1. Get to PMF using GPT-4
2. Replace GPT-4 with a good-enough model fine-tuned on your task
3. Gather success/failure cases
4. Relabel the failures, keep retraining
5. Better model -> more users -> better model, virtuous cycle.
— Kyle Corbitt (@corbtt)
7:00 PM • Nov 14, 2023

How would you describe the culture at OpenPipe? Are you hiring, and what do you look for in prospective team members?

The first thing we look for is raw ability. Assuming we've cleared that, then we’re looking for someone who's excited about what they're doing and wants to take a lot of ownership. I want to work with people who I can tell “hey, we have this problem” and then they come back and say “here's what I think the best solution is”. We are very collaborative, but look for people who aren't just waiting to be told what to do. Today, I think we have a very healthy culture, both with my co-founder and with the engineers we've hired since.

I also think in-person is super important - we don’t hire remotely, and we want people who can be on site with us every day in Seattle. We've had some really good candidates that we've turned down because they either want to be hybrid or they want to be remote. And I just don't think that it's going to work.

We’re looking for a few exceptional engineers to join our team in Bellevue, WA. If you think that might be you, we would love to have a chat.

How did your experience at YC affect the way you lead OpenPipe today?

Firstly, YC was really eye-opening for me to just see what ‘great’ looks like - given the partners, the team and the founders that were coming through the program each batch. You really got a sense of “this is the level that the very best people operate at”, and it pushes me to be more ambitious, work harder and raise the bar higher with the folks that we want to work with.

Secondly, I learned the importance of speed. If you look at the companies that succeed, there's such a strong correlation between the speed at which they iterate, launch new features, and produce, and their growth and success. That's something that we've really taken to heart. We launch really fast, we build a lot of stuff, and we experiment.

Who are the absolute best open-source fine-tuners? dm me if you're open to a contract or a job. you'll be spending all day researching and playing around with the latest research and techniques in fine-tuning/dataset prep.
competitive pay and huge impact.
— Kyle Corbitt (@corbtt)
4:55 PM • Jan 22, 2024

Conclusion

To stay up to date on the latest with OpenPipe, follow them on X(@OpenPipeAI) and learn more at them at OpenPipe.

Read our past few Deep Dives below:

3/15: Baseten is pushing the boundaries of AI Inference ⚡️
3/11: Martian's interpretable alternative to the Transformer 🔌
3/8: Our chat with Groq's Chief Evangelist, Mark Heaps
2/26: MultiOn is building software with a brain 🧠
2/23: Galileo AI's groundbreaking prompt-to-UI tool ✨
2/19: Our chat with OpenAI’s Logan Kilpatrick

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.

OpenPipe fine-tunes your faster, cheaper, better model 💸

Plus: CEO Kyle Corbitt on fine-tuning, AI Engineers and YC...

CV Deep Dive

Our Chat with Kyle 💬

Conclusion

Join Slack | All Events | Jobs | Home