Cerebral Valley
Posts
Langtrace - your AI-native observability solution 🎛

Langtrace - your AI-native observability solution 🎛

Plus: CTO Karthik on his vision for enabling developers to build reliable and accurate AI applications...

December 05, 2024

CV Deep Dive

Today, we’re talking with Karthik Kalyanaraman, Co-Founder and CTO of Langtrace.

Langtrace is a developer-focused GenAI observability platform built to improve the reliability and accuracy of AI-powered products. With the rise of LLMs, developers often face challenges moving from impressive demos to production-ready systems. Langtrace simplifies this transition by providing detailed insights into AI models, RAG workflows, and agentic frameworks, helping developers optimize performance and identify issues in real time.

Since its launch, Langtrace has gained traction with thousands of users on its managed platform and open-source offering, including notable companies like Google Cloud, Elastic, MongoDB and CrewAI recommending it to their customers. By integrating with over 40 LLM providers, vectorDBs and frameworks, Langtrace is quickly becoming an essential part of the stack for developers building AI-first products.

In this conversation, Karthik shares the journey behind Langtrace, the challenges of building for LLM observability, and the team’s vision for supporting developers in building reliable and accurate AI applications.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Karthik

Karthik - welcome to Cerebral Valley! First off, give us a bit about your background and what led you to co-found Langtrace?

My name is Karthik, and I’m the Co-Founder and CTO of Langtrace AI. We’re a venture-backed seed stage startup based in San Francisco, though our team of five works remotely. My co-founder, Ola Muse, also lives in SF.

The journey of Langtrace AI started with our parent company, Scale3 Labs, which we founded in August 2022. Before that, our team was at Coinbase, where we were part of the crypto observability team. While managing blockchain nodes at scale, we noticed a gap in the market—there wasn’t a solid observability solution for blockchain nodes. That led us to start Scale3 Labs and build a product to address this need. Over the past 18 months, we grew that product to six figures in revenue.

Then came the ChatGPT moment, which opened our eyes to new use cases for LLMs. We started experimenting by integrating AI features into our existing product, including a chatbot called NodeGPT and an intelligent log analyzer powered by GPT. While some users found these features incredibly valuable, others reported mixed results or issues with accuracy. We quickly realized we lacked a proper system to measure and improve the performance of our AI features.

As infrastructure and observability experts, we decided to build an internal tool, codenamed Langtrace, around July last year (2023). The goal was simple: measure and improve the accuracy of our chatbot. At first, the chatbot’s accuracy was around 60-65%. By tracking prompts, completions, and other inference metrics through Langtrace, we operationalized the process and improved the chatbot’s accuracy to 90-95% within weeks.

When we shared what we were doing with founders and developers, their enthusiasm pushed us to turn Langtrace into a standalone product. In February of this year, we decided to launch and open source the project, building on open telemetry standards. It’s been about nine months since launch, and Langtrace has grown to thousands of users on the managed service, with strong adoption in the open-source community as well.

That’s the story of how Langtrace AI came to be!

How would you describe Langtrace to an AI engineer or developer who isn’t as familiar?

Langtrace helps developers improve the accuracy of their AI-powered products. If you're building with LLMs, one of the toughest challenges is moving from a shiny demo to a production-ready product with the confidence that it delivers >95% accuracy to your users. Langtrace addresses this by surfacing insights into your entire AI stack—whether it's LLMs, frameworks, or vector databases—giving you a clear picture of what's happening under the hood.

Through its detailed dashboard, Langtrace allows developers to analyze their AI agents, chatbots, or RAG-based applications. It surfaces your product’s AI API call stack in real time with rich metadata and allows you to measure the performance using manual and automated evaluations ensuring your AI products are as accurate and reliable as possible. That’s the core of what Langtrace does.

🚀🚀Super excited to launch native support for @crewAIInc in @langtrace_ai . Check out a quick demo. And no it's not behind any waitlist, you can try it out right now! Reach out to me if you run into any issues, got feedback or need additional Langtrace credits!
— Karthik Kalyanaraman (@karthikkalyan90)
6:26 PM • Sep 4, 2024

Talk to us about your users today - who’s finding the most value in what you’re building with Langtrace?

Langtrace is mostly used by developers, especially those building with LLMs. The ones who get the most value out of it are developers working on products where the primary value comes from AI. For example, you could add an AI feature on top of Google Meets, but Google Meets would still have value without it. On the other hand, if your product is entirely powered by AI, that's where Langtrace really shines.

The typical trigger for adopting Langtrace is when developers have built a shiny demo but don’t know what to do next to measure and improve its accuracy. That’s when they start looking for observability solutions, come across Langtrace, and begin using it.

Could you highlight a couple of use-cases for Langtrace that are proving to be the most sticky with your early customers?

Langtrace is primarily used for two main use cases: AI agents and RAG (Retrieval-Augmented Generation). For AI agents, particularly those built with frameworks like DSPy, Langtrace provides detailed visibility into what’s happening behind the scenes during each session.

One example is a company in Uzbekistan working in the medical field. Their product converts patient medical reports into enhanced, detailed summaries that patients could easily understand. Initially, they struggled with predictability—some reports were generated correctly, while others were inconsistent where they were manually correcting 40% of the generated reports. This lack of clarity forced them to manually correct errors, limiting them to generating only 3–5 reports a day. By adopting Langtrace, which has first-party support for DSPy, they gained deeper insight into their AI pipeline, resolved issues, and scaled their operations to over 20 reports per day.

RAG is another common use case, as it involves multiple systems working together, including vector databases, LLMs, and re-rankers. Langtrace provides visibility into each layer, from how queries are processed in the vector database to the LLM's final response. Developers can see the query flow, what the vector database returns, and how the LLM integrates those results into natural language. This insight helps developers incrementally improve their applications. These two areas—AI agents and RAG—are where Langtrace is making the most impact.

An example of how I think about building and optimizing compound AI pipelines with DSPy and CrewAI . Strongly believe this is how compound AI systems optimized for high performance and reliability will be built. 👇
Step 1: Create individual projects for each block of my pipeline… x.com/i/web/status/1…
— Karthik Kalyanaraman (@karthikkalyan90)
8:02 PM • Sep 26, 2024

We've seen a lot of excitement around AI agents and the idea that autonomous systems will be able to complete a lot of different tasks. How are you thinking about integrating AI agents into Langtrace itself?

That's a great question. Right now, we’re primarily focused on helping developers build highly reliable and accurate AI agents. That’s the core mission of Langtrace. Every time an agent runs through Langtrace, we collect detailed traces about its performance. Currently, developers interpret this data themselves by looking at our dashboard.

Looking ahead, we’re exploring ways to leverage AI to interpret that data for developers. For example, if someone sets up an AI agent or a RAG pipeline with Langtrace, we could provide insights like, "Your agent successfully completed the task 9 out of 10 times, but here’s why it failed in this one instance. You might want to adjust this specific component in your stack." Delivering these actionable insights directly could save developers a lot of time.

Another idea we’re exploring is making Langtrace insights accessible beyond the platform itself. What if developers didn’t have to log in to Langtrace but could get these insights directly on Slack or via email? That’s part of our longer-term vision, perhaps in the next three to six months.

Right now, though, our main focus is on perfecting the tracing experience and expanding our integrations. We’ve already added support for nearly 40 integrations and are actively working to support even more. What this means is, developers can integrate Langtrace with just 2 lines of code and get setup in less than 2 minutes.

What has been the hardest technical challenge around building Langtrace into the product it is today?

The hardest technical challenge for us has been multi-faceted. Initially, it was all about building a seamless developer experience which meant integrating natively with all the providers. We aim to make Langtrace as simple as possible for developers to adopt—currently, it’s just two lines of code to integrate and takes less than 2 minutes. Our goal has always been to keep observability minimally intrusive, meaning it shouldn’t dictate how developers structure their code. To achieve this, we had to design SDKs that could capture traces asynchronously at low latency while requiring minimal effort from developers.

Another challenge was ensuring we gathered consistent feedback from developers to refine our integrations. We had to understand what developers cared about most and use that feedback to build and improve our integrations. We’ve now streamlined this process, making it much easier to handle new integrations.

Currently, our biggest challenge lies in scaling these systems and building a comprehensive platform that goes beyond just tracing. For example, Langtrace now includes features like evaluations and dataset management. Traced interactions often serve as great sources of production-specific data, and within Langtrace, developers can curate datasets from these interactions to run evaluations.

The evaluation space itself is vast and still evolving—approaches like using LLMs as judges are popular, but there are many other use-case-specific methods. The real challenge is creating a product that not only excels at tracing but also facilitates data generation, evaluation, and the iterative feedback loop needed to improve accuracy. We aim to deliver this tight feedback loop of traces, datasets and evaluations in a developer friendly way.

So while this is partly a technical challenge, it’s also a significant product challenge—one that we’re focused on solving every day.

I was curious about how memory worked in @crewAIInc. CrewAI implements short and long term memories to it's agents using @embedchain which in turn uses @trychroma . All this can be visualized in Langtrace. What a great inter play of open source software!
— Karthik Kalyanaraman (@karthikkalyan90)
1:30 AM • Aug 30, 2024

How do you plan on Langtrace progressing over the next 6-12 months? Anything specific on your roadmap that new or existing customers should be excited for?

Our top priority right now is adding more integrations. While we already support most of the popular providers, we’re constantly getting requests for additional integrations, and meeting those demands is a key focus.

The second priority is improving the user experience. The product is only nine months old, and we’ve already iterated on it significantly—it looks very different from the initial version. We want to continue refining the UI and UX to make it as intuitive as possible so developers can immediately understand and benefit from it as soon as they set it up.

The third priority is creating a cohesive experience for developers building with AI agents, regardless of the framework they’re using. The landscape for agentic frameworks is evolving rapidly, with new options emerging almost daily. This is great for flexibility, but it also means each framework comes with its own unique requirements. For example, with CrewAI, users need detailed insights into agents, tasks, and tools, while LangGraph users require more stateful, graph-oriented capabilities. Our challenge is building a product that works seamlessly across all these frameworks.

Over the next three to six months, we’re focused on these goals—enhancing integrations, iterating on the product, and rolling out features to support developers using diverse frameworks.

Lastly, tell us a little bit about the team and culture at Langtrace. How big is the company now, and what do you look for in prospective team members that are joining?

We’re a fully remote and distributed team of five - 2 co-founders Ola and I, two engineers, Ali and Obinna and 1 business lead, Jay. I handle the product and technical side, and my co-founder Ola focuses on the partner ecosystem and business development alongside our business lead, Jay Thakrar, who is based in New York.

When it comes to team culture, we prioritize curiosity, energy, and strong communication skills. All of us are full-stack engineers, so we generally look for people who share that versatility and are passionate about AI. We don’t focus heavily on experience or specific technology expertise—if you’re curious, adaptable, and eager to learn, you’d fit right in.

Anything else you’d like our readers to know about Langtrace?

If you’re building with LLMs and finding it tough to improve accuracy or don’t know where to start, feel free to reach out! You can DM me on LinkedIn or Twitter—whether it’s with Langtrace or just general advice, I’m happy to help and chat.

For the last few months, we've been actively involved in OpenTelemetry's GenAI Special Interest Group meetings, with a keen focus on defining and building OTEL standards for GenAI applications alongside leading AI builders such as @Microsoft, @elastic, @Google, @traceloopdev, and… x.com/i/web/status/1…
— Langtrace.ai (@langtrace_ai)
3:22 PM • Nov 18, 2024

Conclusion

To stay up to date on the latest with Langtrace, learn more about them here.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.

Langtrace - your AI-native observability solution 🎛

Plus: CTO Karthik on his vision for enabling developers to build reliable and accurate AI applications...

CV Deep Dive

Our Chat with Karthik

Conclusion

Join Slack | All Events | Jobs