Cerebral Valley
Posts
Axolotl is Flying the Flag for Open-Source AI ❇️

Axolotl is Flying the Flag for Open-Source AI ❇️

Plus: Founder & CEO Wing on the impact of open-source to applied AI...

Daniyal Malik
May 23, 2025

CV Deep Dive

Today, we’re talking with Wing Lian, Founder and CEO of Axolotl.

Axolotl is an open-source library that simplifies fine-tuning large language models, with a focus on accessibility, flexibility, and speed. Originally built by Wing in the early days of LLaMA as a side project for learning how to fine-tune LLMs, Axolotl has since evolved into a robust framework trusted by engineers, researchers, and AI teams to power supervised fine-tuning, reinforcement learning, and continuous pretraining workflows—all configurable from a single YAML file.

Today, Axolotl is used across industries and academic research labs alike - with adopters ranging from open-source leaders like Nous Research to university teams developing domain-specific models in healthcare and beyond. Its Docker-first approach, deep support for Hugging Face datasets, and optimizations like dataset packing and Flash Attention make it a go-to for users looking to move fast without sacrificing efficiency or scale. Axolotl’s no-code design enables developers—especially those without a traditional ML background—to experiment, iterate, and productionize their own models with ease.

In this conversation, Wing shares how Axolotl was founded, the platform’s technical underpinnings, and his vision for how open infrastructure can accelerate progress in applied AI—from agents to efficient training pipelines.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Wing 💬

Wing, welcome to Cerebral Valley! First off, introduce yourself and give us a bit of background on you and Axolotl. What led you to co-found Axolotl?

Hey there! My name is Wing and I’m the Founder and CEO of Axolotl. Before Axolotl, my background is really more in DevOps and back-end engineering. I hopped on the AI train a couple of years ago when LLaMA was first released. I was very interested in it—a confluence of circumstances, including injuries and lots of free time, led me to want to fine-tune LLaMA and experiment. I built Axolotl more as a side project during my own journey of learning about fine-tuning LLMs, approaching it from the perspective of an applications engineer rather than a traditional ML engineer.

I think that approach gained traction because people understood the value in it, and it just picked up momentum from there.

How would you describe Axolotl to the uninitiated developer or AI team?

Axolotl is really a library for simplifying the post-training of large language models across various architectures and techniques. We strive for optimization and to support distributed workloads.

Tell us about your key users today. Who would you say is finding the most value in what you're building with Axolotl?

I’d say the user we're optimizing for is probably someone with a background similar to mine—more of an application developer who’s been tasked with adding AI to whatever they’re building. They might not have a deep ML background but have a solid understanding of AI, LLMs, and the data that goes into them. This is really for users who want to experiment and iterate quickly.

Axolotl delivers this performance advantage while consuming less VRAM and eliminating resource spikes throughout your training runs.
— Axolotl (@axolotl_ai)
12:34 PM • May 8, 2025

Talk to us about some existing use-cases for Axolotl. Any interesting customer stories you’d like to highlight?

One of the things, coming from deploying more traditional applications, was using Docker for everything. The experience with Docker and CUDA has been pretty good over the past couple of years—it works really well. In my own journey trying to get an ML stack running, it’s often been pretty painful. That’s why we’ve always shipped Docker images with Axolotl built in. We usually suggest going to a GPU provider that supports Docker images, lets you deploy from one, and gives you access to the container—then just go from there. If you do that, the setup is minimal.

It’s usually just however long it takes to download the Docker image and spin it up—depending on the provider, that might be 2 or 3 minutes—then you’re ready to go. If you’re just experimenting, you can try it from scratch using one of the many examples already baked into the repository. Or if you already have a dataset and a specific open-source or open-weights model in mind, it might take a minute to edit a YAML file, tweak it, and start the training.

Walk us through Axolotl’s platform. Which use-cases should new customers experiment with first, and how easy is it for them to get started?

When we started, we were definitely focused on supporting a lot of bleeding-edge models. These days, Hugging Face has gotten much better at offering zero-day support for many new model architectures as they come out. Traditionally, we’ve had very good support for the more mainstream model creators—Meta’s LLaMA, Qwen, Mistral, Google’s Gemma—you name it. As long as the model is supported in Transformers, we generally support it well.

As for datasets, Axolotl is optimized for datasets stored on Hugging Face. If you’re an enterprise customer, you can pull directly from your organization’s private datasets. Even if you’re not using the enterprise tier, as long as the dataset is set to private and you have access, you can still use it. We also support loca and S3-backed datasets in formats like JSONL or Parquet—basically however your text data is stored.

One of the features users really like is the flexibility of dataset support. Beyond just where the dataset lives or the storage format, we support a wide range of dataset structures. Whether it’s a chat dataset or one where inputs and outputs live in separate columns, we have strong configurability to let you bring your own dataset and remap or parse it on the fly using YAML configuration. If your dataset doesn't fit those structures exactly, most people are now familiar with the OpenAI dataset format, which has become pretty common.

It’s just each row as a JSON with fields like messages, content, and role—and we support that as well. Many fine-tuning providers use this format, so if you already have it, you’re good to go.

We've linearized the experts in Llama-4 Scout; you can now fine-tune w/ 2x48GB GPUs @ 4k context. (adapters on self attention and shared experts). 8k context + adapters on experts uses only 2x53GB
Support for linearized Llama-4 now in Axolotl OSS v0.8.1.
Details & Model in 🧵
— Wing Lian (caseus) (@winglian)
1:11 AM • Apr 8, 2025

How are you measuring the impact and/or results that you’re creating for your key customers? What are you most-heavily focussed on metrics-wise?

It's been a while since I’ve done a proper review of what everyone has built with Axolotl, so my understanding is a bit out of date. We’ve seen a lot of researchers using it—some working on medical models, mostly through universities. Others are doing general research. Nous, for example, does a lot of their open-source models like Hermes using Axolotl.

There are a number of companies working on fine-tuning. What sets Axolotl apart from a product or technical perspective?

A big feature that users really like about Axolotl is packing—the ability to concatenate multiple samples or rows from a dataset and train on them in a single batch. Without getting too deep into the technical details, packing helps reduce the amount of compute wasted on padding tokens. Because dataset rows can vary in length, traditional batching often ends up using extra compute on those padding tokens. With packing, we replace a lot of that padding with actual trainable tokens, which can lead to significant speedups—often 2 to 6 times fewer steps.

Since we're filling most of the context window, this can translate to a 50% to 100% speedup for many workloads compared to more traditional training approaches. We also have solid support for attention optimizations like Flash Attention and Flex Attention. The PyTorch team has been great about supporting packing and Flex Attention recently, and xFormers has also added support for this, which has been great to see.

Another standout is the dataset handling. If you're bringing your own datasets or working with a mix of different datasets—including open-source ones—you can just add them to a list, and Axolotl will pull them all in, mix them, and you're ready to go. It's all relatively low-code, with almost everything configurable through a YAML file, so users can get started without needing to write any actual code.

Could you share a little bit about how Axolotl’s platform actually works under the hood? What’s the reasoning behind some of the architectural decisions you made?

The big piece is that you can do RL, supervised fine-tuning, and continuous pretraining—all from the same YAML interface. It’s very easy to switch between these without needing to instantiate new trainers or write new code.

Another major advantage is the composability of features. You can quickly experiment with things like FSDP, DeepSpeed, LoRA, QLoRA, different forms of quantization, or run various sweeps. All of that is easy to configure through the YAML file.

You also get strong observability if you're using tools like Weights & Biases. We make sure to store your YAML configuration as an artifact, so you can easily go back, download the file, and know exactly how to reproduce a run. Reproducibility is a big deal for many teams.

As an open-source project, we know things move fast and break often. Upstream changes can break things downstream and vice versa. We’ve invested heavily in CI and testing so we can maintain confidence in releases without needing a lot of manual testing. Like any good engineering team, we’ve tried to make sure things are stable even as the ecosystem evolves quickly.

Start fine-tuning Qwen3-14B for free on Google Colab:
— Axolotl (@axolotl_ai)
12:37 PM • May 8, 2025

2025 is the year of the AI Agent. How does this concept factor into your product vision or internal process at Axolotl, if at all?

Reinforcement learning (RL) has always been a longer-term focus for Axolotl. One of the more complex challenges in RL is orchestrating both inference and training workflows, which is something we’re actively thinking about and aiming to support better over time.

When it comes to agents specifically, the current ecosystem tends to treat training and agents as two separate abstractions. Our goal isn’t necessarily to build agent frameworks directly, but rather to provide the flexibility in dataset composition and RL technique selection so that users can train agent-like systems if they choose to.

So while agent support isn’t the core focus of Axolotl, the infrastructure we’ve built—especially around RL—is designed to be adaptable enough to support those workflows. It’s still an evolving area, and something we’ll continue to improve.

What has been the hardest technical challenge around building Axolotl into the platform it is today?

We’re really focused on helping developers scale to bigger models and more compute. With larger models emerging, we have a dual focus on supporting both big and small models. For bigger models, we’re working on ND parallelisms like tensor parallelism and pipeline parallelism. We’ve also added sequence parallelism recently and want to extend that to other RL techniques, such as GRPO, so you can train with much longer context lengths, especially if you have many GPUs.

For GRPO, we’re adding quantization-aware training. Since models are often quantized during deployment, training with fake quantization of weights helps reduce accuracy loss when deploying quantized models.

On the smaller model side, the focus is on lower latency while retaining high accuracy. We’re looking at how to better support knowledge distillation and make that process easier. Ideally, we want to automate the pipeline where you provide a dataset, train a large model, and then perform knowledge distillation to produce a smaller, efficient model or speculator.

How do you foresee Axolotl evolving over the next 6-12 months? Any product developments that your key users should be most excited about?

There’s a lot of excitement around open source AI right now. With thousands of papers published regularly, the only way for everyone to progress is by sharing work instead of independently implementing the same ideas behind closed doors. That kind of isolation doesn’t help anyone move faster. Often, people share research papers but not the actual code or implementations. Since many papers are highly technical, converting them into working code can be quite challenging for most people. That’s why I’m very bullish on open source.

Lastly, tell us a bit about the team at Axolotl. How would you describe your culture, and are you hiring? What do you look for in prospective team members joining the Axolotl?

We have an amazing community manager named Nano. He actually started out as a grad student in Japan who was helping out on the side, and then after he graduated, he joined us full-time last year. These days, he’s really the face of Axolotl—you’ll probably end up chatting with him on Discord or GitHub because he’s the first line of support for most users.

We’re actively looking for more people to help with community management because, even though Axolotl is open source and free, people still want to feel like they’re getting help and support. We try to make everything as easy as possible, but sometimes we realize we might have blind spots about what’s actually easy or hard for new users. That’s why having more community folks is really important to me—to make sure new users feel welcomed and supported.

Conclusion

Stay up to date on the latest with Axolotl, learn more here.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.

Axolotl is Flying the Flag for Open-Source AI ❇️

Plus: Founder & CEO Wing on the impact of open-source to applied AI...

CV Deep Dive

Our Chat with Wing 💬

Conclusion

Join Slack | All Events | Jobs