• Cerebral Valley
  • Posts
  • Nebius is Your Full-Stack AI Cloud and Inference Platform ⚡️🧩

Nebius is Your Full-Stack AI Cloud and Inference Platform ⚡️🧩

Plus: Nebius Co-Founder Roman Chernin on building a vertically integrated AI cloud, scaling from model training to massive inference workloads, and why open source and data gravity will define the next era of AI....

CV Deep Dive

Today, we’re talking with Roman Chernin, Co-Founder of Nebius.

Nebius is an AI infrastructure company building full-stack cloud designed for the next generation of model training and inference. Established in 2024, Nebius combines AI-specialized hardware, supercomputing performance, and a developer-friendly cloud platform that serves everyone from individual researchers to enterprises running large-scale GPU clusters.

Today, Nebius powers workloads for research labs, fast growing AI startups, and increasingly for vertical product companies and enterprises that are shifting from frontier models to customized open-source systems.

Nebius recently launched Token Factory, with the goal to give companies the flexibility to build on both proprietary and open-source models, optimize for cost, latency, and throughput, and seamlessly scale workloads from training to production inference. Its architecture, built on fully virtualized infrastructure with bare-metal-level performance, allows customers to fine tune models, run high-availability inference endpoints, and integrate deeply with their existing data pipelines.

Nebius has also signed multi-billion-dollar infrastructure agreements with hyperscalers including Microsoft and Meta, bringing additional scale and validation as it expands its multi-tenant cloud business.

In this conversation, Roman shares how Nebius was built, why full-stack control matters in AI infrastructure, and his vision for how open-source models, data gravity, and next-generation inference platforms will shape the future of AI.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Roman 💬

Roman, welcome to Cerebral Valley! First off, introduce yourself and give us a bit of background on yourself. What led you to build Nebius?

Thank you for having me. I joined the Nebius founding team in 2022. Before that, I spent 11 years at Yandex. Before Yandex, a long time ago, I started as a software developer, but quite quickly realized the people around me were much more brilliant than I was, so I shifted to management. Throughout my career, I’ve had the privilege, and still have the privilege, of working with some of the most talented engineers in the world.

At Yandex, I led search, and then maps and navigation. At Nebius, I’ve worked across various functions, mostly spanning go-to-market, sales, and product—whatever founders typically need to do in the early years to drive growth. Now my role is focused more on new product development: overseeing our software initiatives that go beyond providing, in cloud terms, infrastructure as a service. To be clear, when I say infrastructure as a service, I don’t mean bare metal clusters. We already have a robust software platform on top of our hardware, but from the customer perspective, it's still primarily about access to infrastructure.

We see the company’s future in building the software stack upward, and I’m working with the team to make sure we deliver the next layers of value.

You said that you joined the founding team in around 2022. Did you guys have a vision of an AI cloud when you first joined and what led you guys to kind of start building so early in the AI cycle? 

When we started in 2022, it was mostly forced. Before the real story of the company began, we had to go through a complex and painful corporate restructuring process from Yandex. We knew we would build a cloud, but we didn’t yet know it would be AI-centric. 

When the GPT moment happened at the end of 2022, everything we knew from our past—including our relationship with Nvidia, our understanding of their work—made it clear that our future was in building an AI cloud from that moment forward.

How would you describe Nebius to developers who may not be familiar?

It’s quite simple. What do you expect from a cloud? We combine the performance of AI-specialized hardware and HPC-level capabilities, but deliver it in a developer-friendly cloud experience.

We have a multi-tenant cloud and a fully robust software platform that lets developers spin up environments and get to work quickly. The same platform helped us scale fast because it allows us to serve a wide range of customers: from individual developers using our self-service flow, swiping a credit card and grabbing a few GPU hours to run experiments, all the way up to large managed customers running multi-thousand-GPU interconnected clusters.

Talk to us about how customers have been using Nebius! Any interesting stories you’d like to highlight?

We have the privilege of working with some of the most advanced and fastest-moving developers in the history of the tech world. Until a few months ago, people mostly came to us to build and run their models. That included large-scale training jobs and large-scale inference. The first wave of customers were research-driven model builders, from the largest teams to very small ones.

Then, around the beginning of this year, we started seeing a second wave. These customers are mostly vertically focused product companies along with software vendors and cloud-native digital enterprises. They aren’t coming to build models. Instead, they’re building solutions on top of models as a service.

That shift is why one of my key initiatives is Nebius Token Factory: an inference platform with related services that help people bring models into production, run them at scale, fine-tune them, collect the data generated in the process, and more.

I’d say that most of our customers have a lot in common. They’re deep tech, very direct, very demanding, online 24/7, and they make decisions fast. It’s not the enterprise world yet, and there is a lot of fun in that dynamic. Large deals happen overnight. People show up with a great product and need massive capacity immediately. One of our early customers came to us after a successful product launch when their inference demand grew in a week from a few hundred to 10,000 GPUs, and they needed to source that capacity immediately.

It’s very fast, very technical. But as I said, the profile of customers and the types of companies we work with is starting to change. Before, it was mostly research-driven foundational technology builders. Now we’re seeing a lot more product builders, and I’d say that’s a great trend. Building models is an investment stage, but now we’re seeing models that are so capable they unlock scenarios in almost every industry: coding, legal, customer support, sales, procurement, supply chain optimization, you name it. I don’t think there’s a single vertical that isn’t at least starting to be disrupted by these products.

We’re starting to see real revenue-generating businesses emerge, and for us that’s an exciting moment because it means we can be fully aligned with our customers. The more efficient the infrastructure we build, the more robust the service we provide, the more successful they become, and the less dependent they are on the next fundraising cycle. If their business grows, our business grows with them. It’s a great time to be in AI.

I’d like you to walk us through Nebius’s products, particularly with Nebius Token Factory. Which use cases should new customers experiment with first, and how easy is it for them to get started?

Token Factory’s mission is to help companies build vertical AI solutions on open-source models. The pattern we see is that people mostly start with closed ecosystems. They validate that the use case works and that it delivers real value, and at that stage they don’t want to take risks on the underlying technology. But when growth comes, there are strong reasons to start shifting to open-source models: economics, latency optimization, or the need to fine-tune and post-train models to meet very specific product requirements with proprietary data, which is becoming the main moat for all vertical AI companies.

That’s the stage where Token Factory meets them. We provide a scalable, robust inference platform to run models at scale, whether that’s off-the-shelf popular models like DeepSeek, Qwen, Llama, or Nemotron from Nvidia, or models customers have fine-tuned themselves. If they need to bring a model optimized for their specific use case, we support them in optimizing the model even further. Most customers start their journey asking how to meet their requirements, and every case involves optimizations and tradeoffs based on what matters most for their product.

Price, quality, throughput, latency—you need to balance all of it. We help customers get the exact endpoint and service they need with the SLAs they require, and provide the tools to achieve those results. We're already seeing customers achieve dramatic outcomes—one company cut costs by 26x compared to proprietary models while processing hundreds of billions of tokens daily, with autoscaling that eliminated manual intervention.


Around that core, we’re building evaluation tools, fine-tuning as a service with your own data, observability, and enterprise features like security and access control, and more. It is a product designed for building AI at scale.

Our initial hypothesis was that we’d mostly serve small independent teams. But the reality is that, at scale, people don’t usually build on open-source models at the start. They begin with frontier models: OpenAI, Anthropic, Google, and so on. We meet them when they’ve reached the stage where scale-driven requirements push them to consider alternatives. The ultimate goal is to make that transition from closed-source models to open-source models as smooth as possible. And it’s not just about the model itself, it’s the entire developer experience around it: the tools, the workflows and how teams operate. That’s what Token Factory is.

There are a number of companies working in the compute space, specifically in cloud. What sets Nebius apart?

Our long-term belief is you need a full-stack approach to win here. We’ve already proven this in cloud competition—having all the layers, from physical infrastructure through software, has validated that thesis. And now with inference, we can provide better service, more flexibility, and a more optimized experience because we’re building the inference platform directly on our own infrastructure.

It starts with flexibility around compute, and we’re in a world where compute is king. Flexible access to new chips, pay-as-you-go consumption without long-term reservations, the ability to handle spiky workloads without over-provisioning, all of that comes from how we’re structured. Inference at scale isn’t just kernels and token-level GPU optimization. It’s the orchestration around it. That’s our main strength, our bread and butter. These are the capabilities that come from being a cloud provider.

I should say: there are many great products out there, and we still have a lot of work to do on optimization and developer experience. But what sets us apart from other folks—AI-native competitors, not hyperscalers—is fundamentally the full-stack approach.

It gives us a better cost structure, more flexibility for customers, and more ways to optimize across different layers. Many of our training customers also come to us to run inference as a service. One of the key values we provide is the ability to run different types of workloads on the same compute.

When you reserve a training cluster, you never consume it at 100% utilization. So the spare compute you’ve already paid for is perfect for serving inference spikes. That means less over-provisioning. And because we have visibility across different workloads and work with customers across multiple use cases, we can serve them better: both economically and from an optimization perspective.

At the end of the day, inference is very much an economics game. It’s not a one-time investment; it’s the unit economics of the product. Especially in B2C scenarios, squeezing every percentage point of performance matters because it unlocks new use cases and new opportunities for growth.

Could you share some more about how you build a cloud from the ground up? What were some key technical decisions you made early that paid off?

The fundamental technical decision we made was to build a true cloud—not just clusters and automation tools—but a fully virtualized environment. It took a while to convince the industry and customers that virtualization doesn’t cost them efficiency. We have proved it through benchmarking with Nvidia and MLCommons. We don’t virtualize the most performance-critical resources, GPUs and InfiniBand, we pass them through. But from the customer experience side, this decision was crucial. It impacts reliability and usability: you can provision quickly, and if a physical node goes down, you relaunch a VM on a new one immediately.

Everything we’ve built is API-driven. We’re becoming a preferred destination for partners who need not just a cluster, but a platform to build their products on. When you build a product on top of infrastructure, you need every part of that functionality exposed as APIs that work consistently. That has to be baked into the foundation across the entire stack.

Physical infrastructure is another area where we have invested heavily. We build our own racks: to Nvidia reference architecture design, but manufactured by us. That gives us both a cost and a technological advantage. In hybrid models where you don’t control the underlying infrastructure or you procure components from others, you can’t always guarantee quality. Products in this space move extremely fast, and if you can’t go down the stack and debug everything, from datacenter power to API edge cases, it becomes hard to deliver reliability.

So if you think about Nebius, the core difference is that we’re software-driven and we go through the entire stack end-to-end.

You see these announcements of massive data center projects happening almost every day now. How do you foresee Nebius evolving over the next year? Any developments that users should be most excited about? 

We now think of ourselves as having two lines of business. One is the super-large hyperscaler or superlabs bare-metal builds that everyone in the market wants to have the privilege of doing, and we’re no exception. We signed with Microsoft, and recently with Meta. We don’t see that as the final state of our business, but it’s a great validation for the team and technology, and honestly a great source of capital to build our core business.

The core business is the multi-tenant cloud we’ve been discussing. Next year is still a year of extremely fast growth for us. In our latest earnings call, we announced ambitious goals across every dimension: contracted power, connected power, deployed GPUs, and revenue. Delivering on that requires a lot of work.

We think about the company’s development in two dimensions: horizontal scale and vertical growth through products. Next year is also the year of software for us. Token Factory is the main focus, but we’re also building components of the data platform because long term, we believe data gravity and your position in the data layer are just as important as compute. Compute isn’t that sticky. Data creates the moat.

The vision and goals we’ve set include becoming much more diversified in our customer portfolio. That includes the full spectrum: from model builders to product builders, AI natives to digital natives, and eventually classical enterprises. Each new type of customer brings new product requirements: compliance, security, higher-level abstractions, moving from clusters to endpoints, and more. All of that pushes the product forward.

Lastly, tell us a bit about the team at Nebius. How would you describe your culture? What do you look for in new team members? 

We’re growing incredibly fast, and honestly there isn’t much we aren’t looking for. On the engineering side, we’re very focused on inference right now, specifically around inference optimization and post-training. Those are the two big areas, and they require a mix of research, applied development, and engineering.

We also need to grow our solutions architecture team. Customer-facing engineering is critical in our business: supporting people who train models, run models, optimize models, and everything in between. And we want to grow the ecosystem of software partners around us, which means being a good partner to build with and a strong platform to integrate with. That’s a different set of requirements from what customers look for, but it’s just as important.

And of course, we’re building huge data centers, so we need every type of engineering talent: hardware, networking, you name it.

So if you're technical, go work for Nebius?

Absolutely. We’re hiring in both the US and Europe. Our initial core team was in Europe, and engineering is still largely based there. But we’re growing our presence in the US very quickly, and it’s far from just a sales team now. We have many technical roles in the US as well.

Anything else you'd like our readers to know about yourself or Nebius?

Try our products. Customer feedback and experience speak better than anything else. We’re super excited about the timing, and let’s work together, that’s the main thing.

Conclusion

Stay up to date on the latest with Nebius, follow them here.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, DM our chatbot here.

CVInstagramXAll Events