• Cerebral Valley
  • Posts
  • Fiddler is enriching AI observability for the enterprise 🔍

Fiddler is enriching AI observability for the enterprise 🔍

Plus: Founder/CEO Krishna on addressing hallucinations...

CV Deep Dive

Today, we’re talking with Krishna Gade, Founder and CEO of Fiddler.

Fiddler is an enterprise AI observability platform designed to ensure transparency and trust in AI applications. Founded by Krishna Gade in October 2018, the company’s mission is to provide enterprises with key insights into their AI models' performance and risks, making it easier for companies to deploy AI in a production setting. Fiddler’s platform also offers monitoring, explainability, and analytics for AI models, addressing issues like safety, hallucinations, and operational risks.

Today, Fiddler’s monitoring capabilities and dashboards serve a wide range of customers, from early-stage startups to large enterprises across various sectors, including financial services, tech companies in the Bay Area, hospitality, and SaaS companies. The startup last raised a $32m Series B round in June 2021 led by global venture capital and private equity firm Insight Partners, with participation from existing investors, Lightspeed Venture Partners, Lux Capital, Haystack Ventures, Bloomberg Beta, Lockheed Martin, and The Alexa Fund. 

In this conversation, Krishna takes us through the founding story of Fiddler, the challenges of building an AI observability platform, and their roadmap for the next 12 months.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Krishna 💬

Krishna - welcome to Cerebral Valley! First off, give us a bit about your background and what led you to found Fiddler? 

Hey there! My name is Krishna and I’m the founder and CEO of Fiddler AI. We’re an enterprise AI observability platform based in the San Francisco Bay Area, and our mission is to build trust and transparency into AI. We started five years ago, so we are a pre-generative AI startup! Before Fiddler, I focused on AI and machine learning at Facebook for Newsfeed, especially for building and deploying ranking models. Prior to that, I worked at Pinterest, Twitter, and Microsoft on data analytics and search quality issues. 

At Facebook, I encountered the problem of model explainability because we were deploying many large, deep learning models for recommendations and ranking. Part of the challenge was understanding why certain news recommendations appeared in the feed or why specific ads were shown. My team built tools like "Why am I seeing this?" to provide human-readable insights into the workings of AI models at Facebook. This inspired me to start Fiddler because I felt that while companies like Facebook and Google could build these tools in-house, other enterprise companies could benefit from similar capabilities.

We started Fiddler with the idea of creating an observability infrastructure for AI workflows. Our platform instruments AI, monitors AI applications from predictive models to generative applications, and provides insights into operational performance risks like safety issues and hallucinations. Essentially, Fiddler helps companies deploy AI with confidence by minimizing risks and maximizing ROI around AI. That’s what we do, and it’s crucial for AI to be successful and adopted in the enterprise. Trust and safety are key, and I’m glad to be talking about it on your program as well.

Give us a top level overview of Fiddler - how would you describe the company to those who are maybe less familiar with you? 

These days, many customers are building generative AI apps, like chatbots, summarization applications, or Q&A tools on their enterprise data. They might be storing all their policy documents or customer support documents in a vector database. They might use a modeling solution like OpenAI or open-source language models like Mistral or Llama, and they would build a retrieval-augmented generation (RAG) application. This application queries the data, sends it to the LLM, and makes the LLM come back with a coherent response that can be sent to the end user. This all works well if it functions accurately 100% of the time.

But, therein lies the problem - you may have situations where your LLM is hallucinating. For a given query, you might have a situation where it’s inadvertently sending some PII information in the response, or it might sometimes be abusive. Someone could jailbreak into your chatbot and make it do something risky, like approve a payment or perform a prompt injection attack. For enterprises to deploy and observe these AI applications successfully, they need continuous monitoring of what prompts are flowing into the system and what responses are coming out. Are they risky from a security perspective? Are they accurate?

Enterprises need the MOOD stack, the new stack for LLMOps to build, deploy, and manage LLM applications at scale. The stack comprises Modeling, Observability, Orchestration, and Data (MOOD) layers that are essential for LLM powered applications. Enterprises adopting the MOOD stack for scaling their deployments gain improved efficiency, flexibility, and enhanced support. 

AI Observability, like Fiddler, is the most critical layer of the MOOD stack, enabling governance, transparency, and the monitoring of operational performance and risks of LLMs. This layer provides the visibility and confidence for stakeholders across the enterprise to ensure production LLMs are performant, safe, correct, and trustworthy. The AI Observability layer is the culmination of the MOOD stack, enhancing enterprises' ability to maximize the value from their LLM deployments.

Fiddler provides continuous observability by keeping track of all the outputs of your models—prompts, responses, metadata—and providing a dashboard with rich model insights and alerts when things go wrong. This allows your engineers to debug issues before they become major problems, offering early warnings. A famous example is Air Canada, which had a hallucinating chatbot that gave a customer a made-up response about a potential refund, leading to a significant issue. Such incidents can happen to any company, and Fiddler provides the ability to prevent these problems and ensure safe and trustworthy AI applications in production.

There are a number of notable companies that are using Fiddler within their AI stack today. Are there any specific verticals that you've seen the most uptake in the product from thus far?

Financial services is a key area for us, where we partner with banks, insurance companies, credit card companies, and brokerage firms. These organizations face common challenges—they need to build trustworthy AI that meets regulatory standards, maintains their reputation, and ensures that AI predictions and responses are safe. From a business perspective, they need their AI to perform accurately, whether for credit underwriting, fraud detection, or generating use cases. We aim to provide a platform that mitigates operational, regulatory, and reputation risks associated with AI.

Outside of financial services, we also work with tech companies in the Bay Area, hospitality companies, SaaS companies, and crypto firms. These companies use AI for a number of purposes, from stock recommendations to enhancing CRM experiences and improving marketplaces by connecting people with resources. They also focus on building better ranking and recommendation models. Our platform is versatile and not limited to any specific vertical - it’s quite a ubiquitous tool that anyone building AI can integrate into their stack.

AI observability has become a very sought-after space for a lot of players. What would you say differentiates Fiddler today versus others in the space?

I would say there are three big differences. Firstly, we are built for the enterprise. Our product and go-to-market strategy are designed to gain the trust of CIOs, CISOs, and Chief Data Officers. We focus heavily on enterprise readiness, offering deployment capabilities that allow Fiddler to run anywhere, whether in a data center or on a private cloud. It integrates smoothly with various AI stacks, and our customers find the onboarding and integration experience very straightforward.

Second, we have roots in explainability. In addition to providing ML metrics on model performance, we offer best-in-class root cause analysis. This means we can explain down to the individual ML model predictions, which is a significant differentiator in model monitoring.

Third, we have developed new technology for monitoring LLMs. The metrics for LLMs aren't straightforward, such as determining if a model is hallucinating or producing toxic content. We've created trust intelligence—small language models specifically designed to identify these issues. These fine-tuned trust models are part of the Fiddler AI Observability platform, constantly checking for problems in responses, like relevance to the prompt or hallucinatory content. They can be used as runtime guardrails, providing a score for each response and identifying issues in a task-specific manner, running very quickly. This is a major innovation, and we’re excited to see it working with several production customers already. 

So, our three big differentiators are our enterprise focus, our strong explainability features, and our trust models for assessing language models.

You mentioned financial services as an example of a sector where Fiddler is having a great impact. In that context, how do you measure the impact that Fiddler is having on some of your key customers? 

The first thing we monitor is usage. Are customers using our platform more? Are they registering and tracking more models? Usage acts as a proxy to indicate that our platform is working for them. The second aspect is value—what business value are we adding? For instance, at Fiddler, our goal is to increase transparency into AI across organizations. We have a number of customers, including one of the biggest SaaS companies in the Bay Area. They’ve built an AI control tower, which is a CIO-level dashboard they use weekly to review model performance on multiple dimensions.

They look at model accuracy week over week, the ROI of the models, and usage across teams. These models are used for improving sales forecasts and customer success, directly affecting their ARR. High model accuracy can lead to a higher ARR. They also track how teams are using model predictions, such as driving renewals or improving customer engagement. This is what Fiddler enables today, providing transparency not just to individual developers but across the entire organization.

For another example, one of our ad clients uses Fiddler to ensure compliance with Media Rating Council guidelines for brand safety models in ad tech. They generate reports from Fiddler, for auditors and standards-setting parties, to meet these compliance regulations, and share reports tailored for their clients to increase transparency and trust in the products powered by AI. This provides significant business value, which we track alongside usage metrics, the amount of data sent, and the number of alerts we’re generating.

How do you plan on Fiddler progressing over the next 6-12 months? Anything specific on your roadmap that new or existing customers should be excited for? 

Yes, I would say I'm very excited about trust models because they solve a significant pain point. When I worked on search, measuring search quality was always challenging. We needed to rely on human raters. For example, at Bing, we used a human rating system where people would assess whether a search result was excellent or poor. However, this approach doesn't scale well for all search interactions—you can't use a human rating system for everything in production.

Now, with the advent of language models, it seems possible to use these models as a proxy for humans to assess another language model. By carefully fine-tuning them, we can use language models to evaluate issues like hallucinations in responses or content moderation concerns like safety issues. This approach reduces the need for human labor, though you can still involve humans for red teaming and focus them on problematic areas instead of reviewing every model decision. I'm very excited about this area because it's new and made possible by recent improvements in LLMs.

Lastly, tell us a little bit about the team and culture at Fiddler. How big is the company now, and what do you look for in prospective team members that are joining?

We are still a mid-stage company with a 60-person team, 40 people in the US and 20 in Bangalore, India. We are growing both in the US and at our Bangalore office. We have focused on capabilities, looking for people with the right mix of ability, attitude, and passion. Competency is very important, of course, but we hire people with a startup mindset who can get things done, be scrappy, and operate effectively in a fast-paced environment.

We have leadership on our product and technology side from big companies and startups, including Airbnb, Facebook, Meta, Google, and Snowflake. On the go-to-market side, our team includes people from VMware, AWS, Mulesoft, and Imply. We look for people who align with our values, which include putting customers first, building a responsible company culture, and taking action. One of our core values is "you are the company," meaning when you see a problem, you go and fix it rather than complain about it.

Conclusion

To stay up to date on the latest with Fiddler, follow them on X and learn more about them at Fiddler.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.