Cerebral Valley
Posts
Daily & swyx - Voice Agents Course and Community 📢

Daily & swyx - Voice Agents Course and Community 📢

Plus: Daily CEO Kwindla Hultman Kramer on OSS voice agents, Pipecat, and their new online course with $10k in free credits...

May 06, 2025

CV Deep Dive

Today, we’re talking with Kwindla Hultman Kramer, Co-founder and CEO of Daily.

Daily is a real-time infrastructure platform powering low-latency audio and video experiences for developers. Originally launched out of Y Combinator in 2016, Daily started as a core audio/video company built on WebRTC, enabling teams to embed Zoom-like functionality directly into their apps. With the rise of LLMs and conversational agents, Daily has evolved into a foundational layer for real-time human-to-AI communication as well. The open source framework, Pipecat, started life inside Daily and is now used and supported by teams at NVIDIA, OpenAI, Google DeepMind, AWS, and hundreds of startups building the next generation of AI-native experiences.

Recently, Kwindla launched a Maven course on Voice AI to help engineers and product builders go from zero to one in building real-time voice agents. The course—co-hosted with notable writer, founder and AI thought-leader swyx—includes 20+ expert-led sessions, hands-on office hours, and deep dives into tools being used in production today. It’s designed to be flexible, community-driven, and vendor-neutral, with contributions from top teams across the AI landscape, including OpenAI and DeepMind, as well as early-stage infra startups building the future of voice-native interaction.

The schedule for the Voice AI Course is coming together. 28 sessions with awesome people from AI Tinkerers, Arcee AI, Baseten, Daily, Cartesia, Cerebrium, Coval, Deepgram, fal, Fireworks AI, Fixie, Freeplay, Gladia, Google DeepMind, Hugging Face, Latent Space, Layercode, mem0,
— kwindla (@kwindla)
4:51 AM • May 5, 2025

In this conversation, Kwindla shares how Daily evolved from WebRTC pipes to a key enabler of voice agents, why real-time AI has unique infrastructure demands, and how Pipecat became the most widely used voice agent framework in the world.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Kwindla 💬

Kwindla, welcome to Cerebral Valley! First off, introduce yourself and give us a bit of background on you and Daily. What led you to start Daily back in 2016?

Hey there! I'm an engineer. I've been working on large-scale real-time networking and real-time audio and video for almost my entire career. I co-founded Daily because I believed the Internet was increasingly moving toward real-time audio and video, and more people would want to communicate that way. There was a standard called WebRTC that was just getting off the ground around 2015–2016, and I thought we could build terrific global infrastructure for audio and video communications, along with developer tools on top of that.

So we went through Y Combinator in Winter 2016 and grew in that core audio and video business for several years. When GPT-4 came out, we realized that the infrastructure we had built to help humans talk to humans could also help humans talk to AI. That was interesting from both an engineering and a business perspective.

We're an engineering-driven team, so we try to build things we think are genuinely interesting. And when GPT-4 launched, nothing was more compelling than the new set of possibilities it enabled. If you could really have a conversational interaction with GPT-4, what did that mean? What did it open up? We helped our customers and partners ship the first LLM voice agents that could operate at human conversational speed.

We built a bunch of tools around that—both for our own experimentation and for our customers. Those tools turned out to be compelling enough that we decided to open source them to help accelerate the broader developer ecosystem around real-time AI. The project, called Pipecat, has now become the most widely used real-time agent framework. It’s used by NVIDIA and supported by teams at Google DeepMind, OpenAI, AWS, and hundreds of startups.

How would you describe Daily to the uninitiated developer or AI team?

What we do at Daily is help developers build real-time communications. If you want to build something like Zoom but fully embedded into your own website or native mobile app, you use our SDKs and send the audio and video traffic over our highly optimized, super real-time global infrastructure.

When new AI possibilities started to come into focus, our initial thinking was simple: we already route audio and video over this infrastructure, and it doesn't really matter whether it's a human or an AI on one or both ends of the call. That core—very low-latency audio and video infrastructure—is still what we do at Daily.

On top of that, the AI-specific use cases are all open source in our ecosystem. Pipecat is a totally open source, vendor-neutral project. You can use Pipecat with Daily, with your own WebSocket servers, or even with a telephony provider like Twilio. We think it's important for Pipecat to be completely flexible, vendor neutral, and open source.

So if you want to move audio and video packets over the network using our infrastructure, great—you can do that with Pipecat. If you want to move them in some other way, that works too.

You recently announced your Maven course, which covers the topic of Voice Agents via tech sessions and community events, as well as $10k in credits. Talk us through the inception of this course - how did this come about?

There were two threads there. One was that as the open source Pipecat community grew larger, more diverse, and more active, it became clear that it would be helpful to have a comprehensive “getting started with voice AI” guide. We had been keeping notes on all the questions people asked in Discord, in our shared Slack channels, and all the interesting things we saw posted on social.

For the AI Engineer Summit in New York this past February, we blocked out a week to write a thorough guide. Conferences are a great forcing function because of the deadline, and we wanted to do a print run for the event—something people could physically hold as they started to dig into voice and real-time conversational AI.

The print run at the summit got so much positive feedback that we ended up putting the guide online. In line with Pipecat’s whole philosophy, it’s not about how to use one commercial platform. It’s more like: here’s what voice AI looks like—with a bunch of sample code, mostly in Pipecat because that’s what we’re familiar with—but the material is meant to be completely general.

We wrote down everything we've learned building voice AI agents over the past two years.
Core technology choices, minimizing latency, managing multimodal context, interruption handling, turn detection, evals, state machines, guardrails, memory, async and realtime function
— kwindla (@kwindla)
11:19 PM • Apr 10, 2025

How did that lead to the full-fledged course you have today?

We put the guide online and spent a few more days updating it, because everything in AI moves so fast that if you leave anything untouched for a month, it’s already outdated! Once it went up, people started saying, “Maybe we should have reading groups around this,” or “Maybe we should talk through it in Discord.” That reminded me of this amazing experience I had last year in a Maven course run by Hamel Husain and Dan Becker. It turned into what swyx called the “Woodstock of AI.” Tons of people joined, and it became a really fun, community-driven thing with sessions led by experts on evals, fine-tuning, model architecture—just a ton of great content.

So I thought, if we’re going to do something around the guide, why not use that experience as a template and build something similar for voice AI? I talked to Hamel about how he ran it, I talked to swyx, and we put the course up on Maven. (Register for the course here.) We figured if it snowballed the way Hamel’s course did, that would be amazing for the community. And it did.

Now we’ve got 15 companies sponsoring the course with free credits, and another dozen or so expert contributors—some independent, some from companies like OpenAI and Google DeepMind, where there are always 10 different things happening that people want to hear about. We’re probably going to end up with 20 or so sessions over the month. Some are general overviews following the illustrated guide, and some are deep dives or office hours on specific tools that people should know about and are already using in production for voice.

Who is this course aimed specifically at? Are you targeting professional engineers or more-so AI enthusiasts building on the side?

In my mind, the ideal participant is someone who’s either an engineer or a product person actively building something in voice AI. If you’re just generally curious about voice AI, you’ll still get something out of the course (thoughyou might want to just skim the guide). But if you’re actually building, you’ll get a huge amount of value.

The content is broad enough that you can go deep on the engineering side, or take a more comprehensive, product-oriented view like a PM would. But the core audience is definitely people who are actively building or seriously about to start. That’s who this is really for.

You’re co-hosting the course with well-known AI thought leader, swyx. Talk to us about how that collaboration came about and what makes swyx special.

I’ve always loved collaborating with swyx. The events he runs, the conversations he facilitates online, the podcasts—he just does a great job. So when people started suggesting we do something live or more community-oriented using the voice AI guide as a starting point, I immediately thought of swyx. He’s helping organize, and we pulled in a bunch of people that one or both of us know to lead special topic sessions.

Some of those are focused on what their companies are building—tools that people working on voice AI should know about. Others are on broader topics like hardware in voice AI, training audio models, or evals for real-time conversational AI.

We're likely going to have 20 to 25 Zoom sessions throughout the course. Some will be broad overviews—covering the big themes—and others will be deep dives or office hours with specific experts.

Which areas of the course do you think are going to be the most compelling for your target audience? Give us a sense of how you're thinking about curating the programming.

The three conversations I keep having with people—first as they get started, and then as they move from prototype to production—are, one: the broad landscape. If you're building a voice AI agent, should you use one of the new speech-to-speech models, or should you stitch together three separate models—transcription, LLM in text mode, and voice generation? There are half a dozen big questions like that. It’s helpful to give people a lay of the land—what tools are people using, why, and how those choices change depending on what you're building. I think of that as the landscape overview.

The second conversation is: “Okay, I’ve built something that works—how do I actually deploy and host it in the cloud?” Real-time AI is different from standard web applications. You need a different tech stack, and it’s not like you can go to a hyperscaler and just hit a “deploy voice agent” button.t least not yet. In Discord, people are always asking: “How do I scale this to 100 people? 1,000? What if I hit 10,000 users a month?” So that’s a whole separate track. We’re doing both an overview and deeper sessions on different pieces of that. And we’ve got several companies participating in the course that can help with cloud hosting and deployment.

The third big topic is evals. Everyone in 2025 is trying to move beyond vibe-based evaluations. We’re going to spend a lot of time digging into evals—what’s working, what’s next, and how to build something more rigorous.

Talk to us about some of the awesome partners you’re involving!

One of the things I loved about the course last year was how many people brought their expertise in building infrastructure-level and service-level tools. It was very collegial—everyone trying to figure out how to build AI together, all iterating and learning from each other. Sometimes you’re competing, sometimes you’re partnering. I wanted to bring that same spirit to this course for everyone investing their time in it.

Last year, I got exposed to a bunch of great fine-tuning, hosting, and serverless GPU tools through that course, so I reached out to a few people we’ve worked with and asked if they wanted to join. They immediately got it and jumped in. Then once I started posting about it on social, even more people reached out.

I’ve tried to curate it carefully—it’s all tools you'd actually want to use if you’re scaling up in voice AI. I also brought in a bunch of subject matter experts who aren’t building a startup or commercial tool. Some have a consulting mindset, others come from more of an academic background.

So I think it’ll be a great mix of companies and subject matter experts.

How are you thinking about formatting the course from a timing perspective?

I’m kind of modeling the structure based on my own preferences when I take courses. It has to be totally flexible because people have very different availability. Some folks are on sabbatical and this is their main focus. Others are working 80 hours a week trying to ship something and will just catch pieces of it when they can.

Everything will be on Zoom and fully recorded so you can participate asynchronously. We’ll also have a Discord, and based on how many people have signed up and how excited they are, I think the Discord is going to be a lot of fun. We’ll have multiple channels, and each week we’ll run one or two big overview sessions, along with smaller office hours and special topic sessions.

We’re aiming to schedule sessions in United States California morning time slots that work well for a lot of people globally, but we’ll also move things around based on where participants are joining from. Some people will likely attend a lot of live sessions; others will mostly watch recordings and engage in Discord—and we’re designing it to be a great experience either way.

It’s not a lesson-based or project-based course. That said, we will have a lot of lesson-like material available in the Maven portal. But the focus is more on hearing from experts, having live Q&A, and participating in hands-on office hours with the tools you’re most interested in.

Is this course a project you could see yourself doing long-term?

That’s such a great question, and we’ll just see how it goes. I’m super excited about it, and there’s clearly enough interest from others that it’s absolutely worth doing this one. I can imagine doing it again, or doing a different spin on it, but we’ll have a better sense once we’ve gone through it.

We tend to treat things like this as experiments. If something resonates, we keep it going. One example is the San Francisco Voice AI monthly meetups we run. A year ago in January, we did one almost on a whim—we emailed a few people, they forwarded it around, and 40 people showed up, which was 30 more than I expected. So we did another one in February, and 60 people came. Then 80 in March. Now about 150 people show up every month. We live stream it, people tune in on YouTube—it’s become a whole thing.

If this course feels like that by the end, then we’ll keep doing it or build on it in some way. Or maybe it’ll just be a one-off—because I’m definitely overcommitted now. I have to run this course because people are excited, which is great, but I’m not sure if I’ll have the bandwidth to do it again. Maybe someone who takes the course will pick it up and run with it next time.

Anything else you'd like our readers to know about the Voice Agents course?

First, if you participate in this course, you're going to be part of a real community experience. And to me, when something like this really comes together, that community aspect is just as valuable—if not more—than any specific piece of content you might learn. We’re going to put a lot of effort into making it a strong, supportive community.

Second, if you’re building something now—or thinking about starting—you’ll have a ton of hands-on support and touchpoints with people across the spectrum. That includes folks from OpenAI and Google DeepMind, all the way to early-stage startups building innovative evals and infrastructure for real-time AI. It’s a pretty special opportunity to ask questions directly to the people building the tools you might actually use, all within a great community environment.

Conclusion

Stay up to date on the latest with the Voice Agents course, learn more here.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.

Daily & swyx - Voice Agents Course and Community 📢

Plus: Daily CEO Kwindla Hultman Kramer on OSS voice agents, Pipecat, and their new online course with $10k in free credits...

CV Deep Dive

Our Chat with Kwindla 💬

Conclusion

Join Slack | All Events | Jobs