• Cerebral Valley
  • Posts
  • Meet EndoDINO - a SoTA foundation model revolutionizing endoscopy 🏥

Meet EndoDINO - a SoTA foundation model revolutionizing endoscopy 🏥

Plus: Virgo CEO Matt Schwartz on developing AI solutions for healthcare...

CV Deep Dive

Today, we’re talking with Matt Schwartz, CEO and Co-founder of Virgo.

Virgo is a healthcare AI company revolutionizing gastroenterology by capturing and utilizing endoscopic video data at scale. Through its core platform, VirgoCloud, the startup enables clinicians, researchers, and pharmaceutical companies to unlock new possibilities in precision medicine and clinical workflows. Virgo’s tools are used by leading academic medical centers such as The Cleveland Clinic, Mount Sinai, University of Chicago, Beth Israel Deaconess, and UMass, for video capture and AI-accelerated clinical trial recruitment.

Today, Virgo is announcing EndoDINO, a state-of-the-art AI foundation model for endoscopy, and EndoML, an AI development platform powered by the EndoDINO model. Together, these tools are designed to democratize and accelerate AI model development and deployment across gastroenterology. This announcement is poised to turbo-charge Virgo’s plan to become the go-to provider of AI-powered insights for clinicians and pharma companies.

In this conversation, Matt shares the journey behind Virgo, the challenges of developing AI solutions for healthcare, and how EndoDINO and EndoML are redefining what’s possible in gastroenterology.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Matt 💬

Matt - welcome to Cerebral Valley! First off, give us a bit about your background and what led you to start Virgo? 

Hey there! My name is Matt Schwartz, and I’m the CEO and Co-founder of Virgo. A bit about my background: I’m a biomedical engineer by training and, before starting Virgo, spent my career in medical devices and technology product management.

Right after undergrad, I joined a company called NuVasive, which was a leader in minimally invasive spine surgery devices. While at NuVasive I had the opportunity to help develop and launch several instrument and implantation systems for minimally invasive spinal fusion surgery.

After NuVasive, I went to work at Intuitive Surgical as a product manager on the da Vinci robotic surgery system. If you’re not familiar, da Vinci is absolutely incredible technology—arguably the pinnacle of surgical technology today. I always recommend people check it out on YouTube; if you search for “da Vinci Grape,” you can see the surgical robot peel the skin off a grape and suture it back on.

It was during my time at Intuitive, around 2015, that I caught the machine learning bug. I became really interested in machine learning and computer vision and had a light bulb moment when I realized the potential of computer vision for medical procedures. The only problem was that no one was saving video data from endoscopic medical procedures, despite its immense promise.

For context, endoscopy is a broad term for any medical procedure where a doctor uses a camera inside the patient to see around, make a diagnosis, apply therapeutics, or even perform surgery. Endoscopy is incredibly common—there are over 20 million GI endoscopies performed in the US each year, and it’s widely used in other specialties like pulmonology and urology. 

The reason we started Virgo was this belief that the data generated during endoscopic procedures is incredibly valuable, especially for machine learning and big data applications. We felt strongly that if we could build the infrastructure to capture and organize that data, we’d be well-positioned for developing future AI applications.

So, when we started Virgo,, our initial focus was building the industry-leading video capture and management platform for endoscopic medical procedures.

How would you describe Virgo to an everyday individual who’s maybe less familiar with what you do? 

Our core product is called Virgo Cloud. It starts with a small device—we like to compare it to an Apple TV for endoscopy. It’s compact, sleek, and connects to any existing endoscopic video processor, allowing doctors to continue using their normal equipment during procedures.

The device uses patented machine learning to automate the video capture process. Doctors perform their procedures as usual, and the device then compresses and encrypts the video before sending it to our HIPAA-compliant web portal.

Through the portal, doctors have access to a full library of their videos with all the features you’d expect from a modern video platform. They can organize, share, and trim videos, and we use AI for video analysis to highlight key moments.

This is the core of the Virgo platform - from there we’ve layered on additional products and  features, but it all starts with this foundation.

You’ve partnered with many of the top healthcare institutions in the country. Talk us through the early days of Virgo and who your initial customers were, and then how that's evolved over time as you've gotten further and further into the journey. 

It’s definitely been a journey! Healthcare is a fascinating industry with many different customer types and different incentives depending on the health system. 

In the early days, we focused on academic medical centers - within these centers, we primarily worked with directors of endoscopy and chiefs of gastroenterology departments. We targeted forward-thinking gastroenterologists who were eager for a solution to capture and utilize their video data.

Often, these doctors wanted to record procedures to present at conferences if there was something particularly interesting, or to compile case series for training fellows. One of the key early use cases was recording videos to support research projects. This allowed doctors to show exactly what took place during procedures rather than just describing them after the fact. It also enabled video reviews post-procedure to analyze and improve care.

This focus helped us get started by installing Virgo at premier academic medical centers that were pushing the boundaries of GI care. Some of our early customers included Northwestern and the University of Virginia. Over time, we’ve been widely adopted by the leading academic GI programs in the country, and increasingly by private practices.

We’re also beginning to work with the VA system and several VA hospitals. From there, Virgo has proliferated largely through word of mouth, as GI is a tight-knit community. Today, doctors often share videos using the Virgo platform across institutions, further expanding its reach.

Fast forward to today, you have some exciting news to share! Take us through EndoDINO, your newly-released foundation model. It’s not often that you see a healthcare-focused startup actually training and releasing a foundation model for the industry to use. 

Absolutely! With our VirgoCloud platform, we’ve built an massive network for data capture in the gastroenterology space. We’ve now hit a critical mass of data, having captured over 1.75 million procedure videos, which, to our knowledge, is by far the largest dataset of its kind.

For a while now, we’ve been very interested in self-supervised and unsupervised learning for computer vision. Last year, we were inspired by work coming out of Meta, specifically DINOv2 (the research team is French and I think they pronounce it “Dee-no V2”). When we saw it introduced as a foundation model for vision, it felt like an inflection point—someone had figured out how to work with massive amounts of vision data in a self-supervised way, which really resonated with us.

We’ve got an incredible amount of video data—over 35 billion frames, but it’s not  alllabeled. So we’ve been waiting for the right moment with self-supervised learning. We started implementing aspects of DINOv2 into our infrastructure and using it in some of our AI models, specifically one called AutoIBD for  identifying patients who may be eligible to participate in clinical trials for ulcerative colitis or Crohn’s disease.

We always felt that at some point though, we’d want to take the data we’ve captured and train a foundation model specifically for endoscopy. Last year, we began that journey and have now trained a foundation model for GI endoscopy, which we’re calling EndoDINO. It’s trained on the largest dataset of endoscopy videos reported in the literature. For this first version, we used a subset of our video data—just over 130,000 total videos—more than 3.5 billion frames.

From that, we used some interesting data curation techniques, creating datasets ranging from 100,000 images up to 10 million images, and we’re starting to explore the scaling properties in this space. Today, we’re announcing EndoDINO with a blog post on the AI at Meta blog and releasing our manuscript on the foundation model.

How much of a step-function improvement would you say EndoDINO’s specialized capabilities are relative to the existing pre-trained models available today? And what is the data you have so far telling you about that leap forward in capabilities? 

Our experiments show that simple adapters, often just linear probing, used with EndoDINO as a frozen backbone, achieve state-of-the-art performance across a wide range of GI endoscopy tasks, including recognizing different anatomy, segmenting polyps within images, and scoring disease severity for conditions like ulcerative colitis. What’s really cool about EndoDINO is that it generalizes to all different tasks within endoscopy and across different datasets. While we used our own data to train the model, we’re showing that EndoDINO generalizes to downstream tasks with data that comes from completely different representations within endoscopy. We think this concept of a foundation model is going to be really powerful in GI.

If you look at a lot of the literature in AI for gastroenterology, most models are built on top of ImageNet—natural images that don’t contain any endoscopy. The field has been craving a GI-specific foundation model, and we’re excited to meet that need.

We’re providing early access to EndoDINO for some of our academic medical center partners who already useVirgo, as well as select pharmaceutical companies that want to build AI models for specific drugs, like those targeting inflammatory bowel disease. To support this, we’ve been building an AI development platform called EndoML, which will serve as the way to access the EndoDINO model.

EndoML includes tooling that allows users to build on EndoDINO using natural language. One challenge in developing AI for healthcare is bridging the gap between machine learning researchers and clinicians or research scientists. EndoML will enable users without a deep machine learning background to start experimenting with the foundation model and building models for downstream tasks. Once started, they can bring a promising experiment to a machine learning researcher, scientist, or engineer to refine and complete.

We’re really excited about this announcement. The foundation model is a huge milestone, and we’re eager to share more about the EndoML platform and how people can start using it.

Let’s shift gears to EndoML, which is another key part of your announcement today. Could you walk us through EndoML and how you see it complimenting your foundation model release? Is it primarily about enabling others to fine-tune on top of EndoDINO? 

That’s exactly right. A big part of building EndoML is meeting our own needs. We have EndoDINO, and then there are downstream models we want to train on top of it. Right now, most of that work happens as you’d expect—in Jupyter notebooks, where we build something, run experiments, see what works, and iterate from there.

Personally, I’m not a software engineer or a machine learning researcher. I know enough to be dangerous and can jump in when needed, but I crave a tool like EndoML. It’s designed to let someone like me bring their data and understanding of a downstream task to the table and rapidly start experimenting.

The idea with EndoML is that an academic health system already using Virgo to capture videos can access the platform, process their videos through the EndoDINO model, and have all of EndoDINO’s features available to build on top of. For example, if a doctor or health system wants to create a polyp classification system for a specific type of polyp, they can use their own data and, with natural language, quickly develop an adapter head for classification based on their data.

We think this opens up exciting opportunities for health systems, to not just build their own downstream task models, but to then also deploy them into clinical practice. That’s something we’re excited to unlock with EndoML.

Similarly, pharma companies can securely process their own endoscopy data from clinical trials for all sorts of applications. Say they want to study whether you can predict which patients will respond to their Crohn’s disease drug from a baseline colonoscopy. First, they’ll process their trial videos through EndoDINO, then experiment with building classification heads for patients who respond versus those who don’t. This turns endoscopy into a powerful data asset for precision medicine.

Have you found the reaction from the medical community towards Virgo and more broadly, towards AI to be overall positive? For elements that the industry is very concerned about, say with data privacy or model inaccuracies, how have you tackled any skepticism that might have come your way?  

The reception has definitely changed over the years. Healthcare can be a slow industry to evolve, and often for good reason—especially when it comes to data privacy and security. When we started, it was rare for doctors or health systems to record procedure videos. Just getting people on board with the idea of recording those videos led to a lot of interesting conversations.

Now, the tide is shifting. People are starting to see how impactful it can be to regularly capture endoscopy videos. As a patient, if I’m going in for a colonoscopy, I’d want that video recorded so my doctor could review it later if needed. With Virgo, this is now easily feasible across an entire enterprise..

We’ve also received investments from key industry players like the American Gastroenterological Association and Olympus, the largest manufacturer of endoscopy equipment. I think this reflects how the paradigm is shifting. Now it will be fascinating to see how EndoDINO and EndoML are received. I think they represent a new paradigm for AI research and development in GI. As a field, GI has been quick to study AI from a research perspective, but clinical adoption has been slower than you might expect.

With EndoDINO and EndoML, we see a huge opportunity to democratize not just research, but also deployment. If people can build models that are specific to their needs, those models are more likely to make it into clinical practice faster. This could have a real impact on how these tools are used in patient care.

Given your two exciting announcements today, what do you see as your main focus at Virgo over the next 6-12 months? Is there more product innovation coming down the pipeline, or are you mainly focused on partnering with more medical institutions? 

Product innovation is huge for us right now. We’re really excited to share what we’re doing with EndoDINO and EndoML. We already have some early-access partners who will be joining us on this journey with EndoML, and we’re eager to show the world what they build with EndoDINO.

We’re particularly excited about the pharma applications. Over the next couple of months, we’ll be releasing more research on some pretty revolutionary capabilities of what AI can do for pharma. 

And this is definitely a V1. We have a lot of exciting ideas for scaling up from here. We think we can increase the amount of data being used by at least one, maybe even two orders of magnitude. We’re also exploring new foundation modeling techniques for working with video data.

For the next 6 months, this will be our core focus. We’re aiming to have version two ready sometime in 2025.

Lastly, tell us a little bit about the team and culture at Virgo. What makes the team special, and what do you look for in prospective team members interested in joining?

We’re a team of 17, based in San Diego but fully remote, with people all over the country and even a few international team members. We’re a really lean team and like to think we punch above our weight, especially on the ML side. We’ve got two amazing ML engineers, and I try to chip in where I can. I’m biased, but I think it’s pretty impressive that with such a small team, we’ve been able to build a state-of-the-art foundation model for endoscopy.

We’re definitely looking for people who know how to do more with less, who are scrappy and pragmatic. One of the things we focus on, especially in machine learning, is practicality. We’re not necessarily trying to invent novel architectures or make breakthroughs in computer vision. Instead, we focus on what we do best: data capture and data utilization at scale.

We have an incredible resource in our dataset, and we like to pick and choose from state-of-the-art techniques to find the best ways to apply them in our specific domain. If there are people out there who are passionate about building AI models that have real impact in a specific domain—not just generalist models—we’d love to chat. We’re always on the lookout for top-tier talent who share that vision.

And for a final plug…I’ll be in San Francisco this coming week (January 13-16) for the JPMorgan Healthcare Conference to share EndoDINO and EndoML with prospective life sciences and biotech clients. If anyone is around and wants to connect, please reach out! 

Conclusion

To stay up to date on the latest with Virgo, learn more about them here.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.