Cerebral Valley
Posts
TensorStax is Driving The Future of Data Engineering 🎛

TensorStax is Driving The Future of Data Engineering 🎛

Plus: CEO/Founder Aria Attar on how TensorStax is reaching 85–90% accuracy through a compiler like system...

April 04, 2025

CV Deep Dive

Today, we’re talking with Aria Attar, Founder and CEO of TensorStax.

TensorStax is building autonomous agents for one of the most challenging but essential problems in AI infrastructure: data engineering. Instead of asking companies to adopt new tools or workflows, TensorStax plugs directly into existing stacks—tools like dbt, Airflow, Spark, and Databricks—and acts as a deterministic labor layer that helps engineers build, maintain, and debug complex data pipelines with minimal friction.

The company has developed its own in-house agent framework, an abstracted middle layer it calls the “LLM compiler,” and is now training models specifically optimized for this space. By combining precision-focused architecture with reinforcement learning loops and verifiable task environments, TensorStax is pushing the limits of what autonomous systems can do in high-stakes, high-complexity infra. Today, TensorStax is working with mid-market and enterprise customers who are buried in brittle DAGs, broken schemas, and endless maintenance overhead

Following Aria’s last CV Deep Dive in September 2024, TensorStax is doubling down on fine-tuning its models for tool-specific sub-agents, and investing in a dedicated research team focused on reinforcement learning in structured data environments. Aria believes the future of data engineering isn’t replacement—it’s augmentation; and with only one data engineer for every 250 software engineers, the need to supercharge this role makes building TensorStax more critical than ever.

In this conversation, Aria breaks down how TensorStax is reaching 80–85% accuracy on agentic tasks with zero RL, why determinism matters in data engineering, and how the company is building toward a world where agents are quietly maintaining your entire data stack in the background.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Aria 💬

Aria, welcome back to Cerebral Valley! As a quick refresher, introduce yourself and give us a bit of background on you and TensorStax. What led you to co-found TensorStax?

Back in 2018, I graduated high school and got really interested in machine learning and deep learning, when Tesla started showing the viability of FSD through cameras. I wanted to dive straight into working with startups in the ML/AI space, so I started learning a lot of this stuff on my own. I wanted to drop out of university, but my parents weren’t too happy about that, so I did a couple of semesters before finding the confidence to drop out.

After that, I worked at different startups in various roles, mostly in data science, ML and some data infra technical GTM roles. I even tried to start my own company around that time, but we had to shut it down pretty quickly. Eventually, I moved into some go-to-market roles to learn the other side of startups, and that’s kind of how I got to where I am today.

Another quick one - how would you describe TensorStax to the uninitiated developer or AI team?

We're building autonomous data science and ML engineers, where these agents can help with everything from data engineering and ETL jobs to training models, deploying them, and setting up observability. Our primary users fall into two categories. The first is software engineers who haven't spent much time with data science and ML, and we're helping them get to a baseline level of value with some solid processes in place.

The second group is data scientists and ML engineers. They often have more work to do than they can manage, and we're aiming to speed up their processes significantly. The core thesis behind this is that there aren't nearly enough data scientists and ML engineers. If you look at the numbers, there's only about one machine learning engineer for every 250 software engineers, which is a pretty crazy ratio. And that's a big part of the problem we're trying to solve.

You have a couple of really exciting updates to share since we last spoke… what’s been up?!

The main exciting point to talk about is we just closed this 5 million seed round, and we're off to the races now. Even better, the majority of the inbound came after our last Deep Dive in September 2024!

What’s more, we found that most of the interest around helping with data science was actually focused on data engineering in particular. You need to get the data foundation right before you can even touch any applied data science. So we decided to go all in on this as we started iterating more and more with customers.

One of the core insights we figured out is this idea that if we can get these systems to work with whatever existing tech stack a company has today—tools like Airflow, dbt, Spark, and so on—it ends up driving way more value. The customer gets to keep the infra they already have in place, they don’t need to adopt new tools, and our product just acts as a labor layer on top of everything. That’s incredibly powerful, especially as we move toward this direction of having more and more abundant resources. It’s kind of crazy to say, but it’s the direction we’re headed.

All-in, this is a really interesting space because data engineering is at the core of almost any application or applied use of data science—any enterprise runs on a complex data stack.

What does this mean for TensorStax’s product as customers know it today?

One of the things we've learned from customers is that there's a lot more interest in augmenting data engineers - rather than having these systems fully run autonomously or act as replacements for people. That’s just not something customers are excited about, and frankly, it’s not something we’re that interested in either. What we find compelling is the amount of repetitive work data engineers are forced to do—work that doesn’t drive much business value but still has to get done. There’s so much variance across companies, and even within different orgs of the same company, that traditional software just can’t really help.

Within that space, there are two key areas: helping build net-new infrastructure—new pipelines, systems, workflows—and helping maintain what already exists. Both are really important, and there’s a ton of opportunity in streamlining both sides of that equation. Which one of those do you think is most interesting to dig into?

How would you distinguish between data engineering and software engineering in the context of TensorStax’ mission to build an AI data scientist? Which set of capabilities are you tackling first and why?

Data engineering is a lot more rigid than software engineering. If you send a software engineering agent like Devin to go build a frontend component, there are basically infinite ways to do that. You can implement a menu bar in a dozen different frameworks, styles, or file structures, and it’ll still work. But with data engineering, that flexibility doesn’t exist. If you’re working with a massive data lake or warehouse and trying to build out a model using dbt or Spark, there are only so many ways you can calculate the fields and model the data to get it in the shape you need.

Because of that rigidity, it becomes a much harder space for LLMs to operate in. They’re non-deterministic by nature, but in data engineering, the level of precision you need is extremely high—much higher than most other technical agent use cases. That’s something we’ve spent a lot of time on: figuring out how to reach that level of accuracy consistently.

There’s a ton of excitement and interest in the concept of AI agents. Hype aside, how are you actually thinking about the space and framing your efforts there in the context of where agents are today?

Another big focus for us has been on the agent side. From day one, we’ve built our own agent framework entirely from scratch—no agent builders or off-the-shelf stacks. And recently, we completely rewrote that framework. Everything is being pushed toward one goal: making these systems as deterministic as possible, which is absolutely necessary given how rigid and unforgiving the data engineering space is.

One thing most people don’t realize until they’re actually building agents—especially for high-precision domains like this—is that it’s really not glamorous work. It’s long, complex, and honestly, pretty boring. Most of your time goes into handling edge cases and weird out-of-distribution scenarios. It’s the kind of grinding that’s required to make something truly reliable.

What we’ve found is that building robust agentic systems means investing in two parallel tracks: one is the base model itself, and the other is the interface it uses to interact with the real world. For us, that meant building a middle layer between the LLM and all the existing infra—Airflow, dbt, Spark, etc.—which we internally call an LLM compiler. It abstracts those systems into something more LLM-friendly and acts as a safeguard against minor hallucinations by enforcing a more deterministic execution path.

Lastly, this middle layer strips away a lot of the unnecessary steps and parameters the agent would otherwise need to handle to interface with existing systems. A simple example: imagine a batch job where you want to run some Python code on AWS Batch to transform a billion rows from Snowflake. For a human, this is straightforward—you write the Python script, create a Dockerfile, push the container to a registry, define a job queue and job definition, then run it. But for an agent, replicating all of that step-by-step is a waste of capacity.

What we found is that you can abstract most of it away. Instead of having the agent build and manage the full job lifecycle, we just have it define what matters: the transformation code, the dependencies, and any relevant configuration. Everything else gets handled by the LLM compiler layer.

This approach massively boosts performance. In our internal benchmarks, agentic data engineering performance went from 40–50% with a base agent using just a Python tool, to 85-90% success by running through this compiler layer—without touching the base model or doing any reinforcement learning. So that’s where a big chunk of our engineering focus is going right now—building the right interfaces into underlying tools that are abstracted enough to be LLM-friendly and can fix hallucinations deterministically.

What role would you say reinforcement learning plays in your approach this year?

I’d say our second major focus this year is reinforcement learning. It’s been a big area of interest for us—something I’ve been posting about since late 2023—and now we’re finally seeing the pieces come together. The key insight is that RL makes the most sense in agentic environments where the task is verifiable—and structured data engineering fits that perfectly.

Tasks like math, physics, and SQL all have clearly evaluable outcomes, which means you can collect high-quality reward signals. We recently ran an early RL benchmark on our 7B text-to-SQL model, which is designed to reason about query structure and datalake schemas before generating SQL. Even at this early stage, the improvements were meaningful:

Valid SQL (semantically meaningful queries): 24% → 66%
Executable SQL (queries that actually run): 10% → 31%
Row match accuracy (on executable queries): 10% → 21%
Column match accuracy: 57% → 75%

This model is still actively training, and we’re planning to release both 7B and 32B versions later this year.

What makes this especially powerful is how it ties into our LLM compiler layer. Because that infrastructure generates structured, deterministic feedback—like compilation success, execution errors, and schema mismatches—it gives us a clean reward signal on every agent action. That feedback loop is what makes reinforcement learning not just viable here, but genuinely transformative.

You’ve talked about the LLM Compiler layer a few times—can you explain what it is and how it fits into the system?

The LLM Compiler is one of the most important parts of the architecture. At a high level, it’s the translation layer between the agent and the actual systems it’s interacting with—like dbt, Airflow, Spark, etc. But it’s not just an interface wrapper—it actually restructures the task itself.

When an agent gets a request, instead of jumping straight to tool-specific code, we compile the task down into an intermediate format that’s much more abstract and LLM-friendly. Think of it like a set of structured instructions: the transformation logic, dependencies, resources, and configs—all in one clean spec. Then we map that spec to the specific output the tool needs.

What this lets us do is remove a ton of complexity from the agent's planning process. It doesn't have to worry about every single step—like how to construct a DAG definition or create a Docker container. The compiler handles all of that behind the scenes. That also gives us way more determinism and makes debugging way easier.

And because the compiler enforces structure, we can generate clean reward signals—like whether a pipeline compiled, ran, passed tests, etc.—which is perfect for training with reinforcement learning. It’s kind of the unsung hero that makes the whole system actually scalable.

You’ve historically been pretty focused on infrastructure. Any plans to open source parts of what you’re building?

We’re planning to open source a bunch of the internal tools we’ve built to make agents more reliable—things we’ve been using day-to-day to debug, evaluate, and train. We just released AgentTrace, which is a lightweight tracing and evaluation framework for agents and LLMs. It gives you local monitoring, a web UI, and a really simple API for tracking behavior and performance over time.

On top of that, we’re planning to roll out a few more things:

A routing layer for evaluating and dynamically selecting between different LLMs based on task type or performance
And a reinforcement learning environment builder specifically designed for structured data workflows—so we can run task-specific RL loops and get high-quality reward signals

All of this is aimed at helping people build agent systems that are actually reliable in production—not just demos. If we can make it easier for the community to evaluate, stress test, and improve their agents, that’s a big win.

How are you thinking about incorporating this new research into production on a regular cadence? What does that dynamic look like given that your eng team is still tight-knit?

So the plan is to have everything run on our own fine-tuned models. By the end of the year, the entire product will be powered by these fine-tuned quant models. The research org's job is to fine-tune these models, run reinforcement learning on top of them, and build RL environments that make them really good at working with the LLM compiler layer.

That’s where it gets exciting. Because of how that middle layer is structured, we’re able to get really clean, verifiable signal back, which makes fine-tuning way easier than in most other settings. The path to production is super clear. The product itself is structured with sub-agents tailored to specific environments. So if you're working with dbt, there's a dedicated dbt agent. Our research org can then focus on building a model that’s insanely good at interacting with dbt through that middle layer. Behind the scenes, we’ve built routing logic that lets us route tasks to the right model based on what the user is trying to do.

Over time, each agent will have its own specialized fine-tuned model that’s deeply optimized for a specific environment. I’d say this is a highly scalable path to outperforming even the best closed models on real enterprise SaaS tasks.

In September, you mentioned that TensorStax caters to both startups and Fortune 500s. Does your ICP change with the revised approach that you're taking, and which category do you think fits your approach better today?

It definitely changed a bit. One of the most important things for us—and for our investors—is understanding the value delta we’re delivering to customers. If you think about most startups, unless they’re explicitly a data company, they usually don’t have super complex data pipelines. In many cases, a single Python script is enough to transform data and push it wherever it needs to go.

So the more complex your data environment is—the more pipelines, the more tools like dbt, Airflow, Spark—the more value we can deliver on top of your existing stack. That also guides what we’re building. The LLM compiler layer, for example, is really focused on working seamlessly with those specific tools, just given how widely adopted they are across the market.

This naturally means our ideal customer profile is more mid-market and enterprise than early-stage startups. That’s where we’ve seen the most traction. Mid-market companies in particular have been a sweet spot—there’s enough complexity to benefit from what we’ve built, and they can move fast. We’ve also made way more progress with large enterprises since the last time we talked, and that’s been exciting to see.

Given your new raise, how do you see TensorStax evolving over the next 6-12 months? Any product developments that your key customers should be excited about?

There are really two components. One is pure capability—how complex can the pipelines and transformations get, and how deeply can the agent integrate into the ecosystem of tools? Tools like DBT have so much depth beyond just defining a basic view. There’s test creation, schema enforcement, dependency management, and so much more. Same with Airflow, Spark, Dagster, Fivetran—this ecosystem is massive, and we’re going deeper and wider across all of it.

The second part is around ongoing maintenance. A huge portion of a data engineer’s time isn’t spent building new pipelines—it’s spent fixing broken ones. For example, let’s say you have a DAG that ingests data from a third-party API into a warehouse. That API changes a field name or a type, and suddenly your whole DAG fails. You’ve got downstream breakage, and now someone has to investigate logs and manually patch it. What we want to build is a 30,000-foot view of all your data infra, where if something breaks, the agent can automatically parse logs, understand what failed, and either suggest or push a fix, depending on permissions.

This is the future of data engineering—agents that are always on, always watching, and fixing things in the background without needing to be prompted.

Lastly, after all is said and done - how do you feel like you’re going to either amplify or affect the role of a Data Scientist or an ML engineer? Do you think TensorStax makes them 10x productive or changes the nature of their value equation?

That's a really good question. This is something we’re very clear about with our customers—we’re going down the path of augmentation. We're not interested in replacing people or doing some massive workforce transformation. The way we think about it is: we're trying to create super soldiers. Super data engineers. There’s so much repetitive, time-consuming work these teams are stuck doing every single day. The talent pool is already tiny—there’s some stat like for every 250 software engineers, there’s only one data engineer.

You can't replace them even if you wanted to—it’s hard enough to find and hire great data engineers in the first place. Companies are holding onto them tightly. So our focus is on making those people radically more effective—10x to 100x more productive over time. We're building autonomous systems that let them start from square three, not square zero or negative one.

Conclusion

To stay up to date on the latest with Tensorstax, follow them on X and learn more about them here.

Read our past few Deep Dives below:

Build specialized RAG agents with Contextual AI 🌐
Remyx - Your AI Production Assistant 💡
Qdrant's GPU-accelerated vector indexing is here 🔋🌐
Athena is your AI-powered remote hire 🎛
Doowii - your AI-first education platform 📖

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.

TensorStax is Driving The Future of Data Engineering 🎛

Plus: CEO/Founder Aria Attar on how TensorStax is reaching 85–90% accuracy through a compiler like system...

CV Deep Dive

Our Chat with Aria 💬

Conclusion

About | Events | Jobs