Cerebral Valley
Posts
Positron is pushing the boundaries of AI hardware 🔋

Positron is pushing the boundaries of AI hardware 🔋

Plus: Founder/CEO Thomas on the technical decisions that sets them apart...

September 24, 2024

CV Deep Dive

Today, we’re talking with Thomas Sohmers, Founder and CEO of Positron.

Positron is building hardware to accelerate transformer models, with a focus on making large-scale AI computation more efficient and cost-effective. Thomas, who has a long history in the semiconductor industry, founded Positron after seeing the growing demand for transformer-based models like those used in large language models. With a decade of experience and a Thiel Fellowship under his belt, Thomas has worked on AI-specific hardware for companies such as Lambda and Groq before deciding to launch Positron.

Positron has just started shipping its first product, which is designed to provide significant performance and efficiency gains over Nvidia's GPUs, the leader in AI computation. By leveraging FPGA chips, Positron's hardware offers a 3-4x advantage in both performance per dollar and performance per watt compared to traditional GPU-based systems. The company is targeting customers who need the power of large language models but don’t want to rely on external cloud services, offering on-premises solutions that prioritize both security and performance.

In this conversation, Thomas shares Positron’s journey from idea to shipping hardware, the technical decisions that set them apart, and what’s next as they continue to push the boundaries of AI hardware.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Thomas 💬

Thomas - welcome to Cerebral Valley! First off, give us a bit about your background and what led you to found Positron?

Hey there! I'm Thomas Sohmers, founder and CEO of Positron. I've been working in the semiconductor industry, building chips for a little over a decade now. I received a Thiel Fellowship back in 2013, which allowed me to move out to the Bay Area and start my first company, REX Computing, where I designed and built processors for mobile base stations and high performance computing starting when I was 17.

After that, I worked on some cryptocurrency ASICs, then joined a friend’s startup called Lambda as principal hardware architect, where I built out the Lambda GPU cloud, which is now one of the largest pure-play GPU clouds. That was really my first deep dive into AI compute, specifically using Nvidia GPUs. After that, I joined Groq as Director of Technology Strategy, and in the spring of last year, I decided to start Positron after seeing the explosive growth of generative AI workloads and specifically transformer computation and realizing that it was distinctly different type of workload that needed specialized compute to power the applications of the future.

Give us a top level overview of Positron - how would you describe your startup to those who are maybe less familiar with you?

Positron is building hardware to accelerate transformer models. We just started shipping our first-generation product, which uses these flexible, programmable chips called FPGAs, or Field Programmable Gate Arrays. Basically, they are a special type of chip that can at boot time be reconfigured into a hardware chip design, effectively allowing you to “emulate” entirely new chip architectures without having to build custom silicon. We’ve done a lot of custom design work and built out a full system—a physical hardware appliance that can either be on a customer’s premises or in a cloud environment along with all the software for a truly plug-and-play frictionless way to serve transformer models.

Today, it delivers a 3-4x improvement in performance, performance per dollar, and performance per watt compared to Nvidia GPU-based systems.

Give us a sense of who you’re serving - who’s finding the most value in what you’re building with Positron?

We're still in the really early stages—our hardware has been shipping for less than a month. But the customers we’re working with fall into two main categories. The first group consists of customers who want the power and capabilities of large language models but don’t want to rely on a cloud service. These are companies with either proprietary models they want to deploy or proprietary data they don’t want to share outside their environment. So it’s similar to anyone using OpenAI or another tokenized service provider, but they want the models to run locally, “on-premise.”

The second group are cloud service providers themselves. A lot of them have already deployed large numbers of GPU-based systems, and we’re able to offer them a solution that provides better performance in terms of tokens per second per user while also being more cost-effective to operate and deploy.

Inference has become a huge part of AI investment in the last two years. Could you give us an understanding of how Positron’s hardware differs from some of the other players in the market?

This demo shows a head-to-head matchup between two of our Positron accelerator cards and two NVIDIA SXM5 H100 modules running Meta’s Llama 3.1 8B model. It is designed to be as close to an apples-to-apples comparison with the exact same hyper-parameters, quantization, operation precision, etc. and both running the MMLU benchmark. You can see here that we’re performing at around 270 tokens per second per user, while the Nvidia system is hovering around 160 tokens per second per user. You can also see that we are using about 1/3rd the power per token. With our systems shipping today being at less than half the cost of an NVIDIA DGX H100, it means we can enable the inference of Llama and other transformer models with a 3 to 4x performance per dollar and performance per watt advantage while delivering a noticeable throughput advantage.

The key takeaway here isn’t just the speed and performance for the end-user query but for our customers—the providers themselves. Whether it’s token-as-a-service providers or companies looking to leverage language models on their own proprietary data, we’re able to deliver the same capabilities as Nvidia but at a fraction of the cost and power, and often faster.

Hardware for the AI revolution has long been respected as a challenging area to pursue. What has been the hardest technical challenge you’ve faced so far while building Positron?

We’re really proud that, in less than 18 months from starting, we’ve been able to get to the point where we’re shipping production hardware. That’s pretty much unheard of for hardware startups. The key driver behind this is that, unlike most other AI chip or hardware companies, we’re leveraging FPGAs (Field Programmable Gate Arrays) as the execution engine in the solution we’re building and delivering today.

FPGAs have allowed us to get to market much faster and with lower NRE (non-recurring engineering) costs. While FPGAs may not be as efficient or performant as ASICs, using them means we can quickly iterate on designs and get hardware into customers' hands early in the process. Most companies take three to five years before they can ship a product, but by the time their chips are ready, the market may have already moved on.

We felt it was far more important to iterate quickly and learn from real customer use, rather than spending years on building a chip that might be outdated by the time it launches. That approach has allowed us to move faster with fewer resources than almost any other AI hardware company out there. But yeah, we haven’t slept much in the last year and a half!

There’s been an explosion of interest in agentic workflows and multi-modal AI. How has that shaped the way you’re thinking about building Positron?

We’ve really focused on the layer above just the hardware, recognizing that transformer models are the future. Right now, large language models are the most pervasive, with ChatGPT being a major application using transformers, but our technology is built to support more than just that. We can handle image transformers, video transformers, and other multimodal applications. This means we can support a range of tasks, whether it’s analysis or generation, across different types of data. Fundamentally, we think the limiter for the deployment of these applications is the economics of scaled inference, and Positron can provide a 4x performance per dollar advantage today and more in the future.

How do you plan on Positron progressing over the next 6-12 months? And is partnering with data centers a strategy on the horizon?

We decided to start shipping because we hit the minimum feature set and stability that both we and our early customers agreed on. Even though the performance I showed earlier, which exceeds Nvidia, is impressive, it's not the endpoint for us with this hardware. Over the next six to nine months, we expect to more than double the current performance and add significant new capabilities, like speculation and paged attention—things that many people take for granted on existing platforms, but we're not even doing yet. These features will come through software updates, and of course, being a hardware company, we have a product roadmap that includes our next generation product, which is already in development beyond our current FPGA-based devices.

Key to our business plan is partnering with existing cloud providers, data centers, and the like, rather than trying to compete with them. Some of the other companies I’ve referenced end up deploying their own hardware because they don’t have compelling solutions to actually sell it, and as a result, they end up spending a massive amount of capex and effort—and they’re losing money doing it.

For us, we’re selling based on the merits of our hardware and the value it generates for those who purchase it. We do have our own hardware deployed in our office to support try-before-you-buy, giving people the chance to experience Positron without making a huge upfront investment. But ultimately, we don’t want to compete against companies that are much better equipped than us to build out and provide services on top of our hardware.

What do you think Positron’s competitive edge is in the space you’re operating in? What are you doing differently from an architecture perspective? Is it using FPGAs?

The key differentiator between us and both the incumbent GPU architecture, as well as most of the other AI chip startups is our primary focus on the memory system. Transformers, unlike the convolutional neural networks (CNNs) that were the main focus of AI silicon for the past decade, are not simply accelerated by throwing more FLOPs at the problem, and are instead heavily memory bound during inference. Our magic is in our ability to actually achieve nearly perfect utilization of our external memory bandwidth, while GPU architectures are typically only able to achieve 10% to 30% of their theoretical memory bandwidth, which directly impacts raw performance and efficiency. We can also leverage multiple external memory technologies directly attached to our devices simultaneously, being able to get the speed advantages of HBM while also having the capacity advantages of DDR5, enabling massive context lengths, concurrent users, and scalability to the multi trillion parameter model sizes of the future.

We’re not stuck with FPGAs, and our next-generation product is going to be an ASIC. A key thing about our ASIC design, without going into any of the technical details, is that our ASIC is being designed with the heritage and understanding of real customer needs, and us having delivered a product that actually solves customers' problems and is at a price and performance point that is better than Nvidia. If you look at a lot of other companies out there that already have ASICs and have started to build services and things around it, fundamentally, the companies that are providing cloud services are doing it as a way to hide the fact that their silicon actually isn’t economical and doesn’t have any real advantages over Nvidia on a chip-to-chip comparison.

The key thing for us is driving performance per dollar and performance per watt. So, as we use all the tricks up our sleeves to improve our performance, performance per dollar, etc., we’re still staying true to what customers actually demand, which is having a real economic basis for their purchase decision. While if you’re using one of the other AI chips out there that claim to be competing against Nvidia, fundamentally, they don’t have a true economic advantage, or they would actually be selling, and they would have had luck when they were trying to sell the products that they had noted.

Lastly, tell us a little bit about the team and culture at Positron. How did you bring the original group together, and what do you look for in prospective team members that are joining?

The vast majority of our team has had the great fortune of working at some of the best companies in their fields—whether it's in CPU or FPGA design, manufacturing, networking, and so on. But a really key thing is that regardless of everyone having reached high positions and taken on strategic responsibilities, at their core, everyone is still a real engineer who’s ready to get their hands dirty for any task that's necessary. We’re super proud of our combined heritage, with our 21 employees having over 500 years of chip and system design experience. Even though our collective past experience has been in shipping billions of dollars and dozens of generations of high volume products, egos are checked at the door and everyone is working tirelessly to bring Positron’s technology to the world.

One of the core values we started the company with, and something I hope we can carry on forever, is that no job is above or below anyone. Even the most junior person should feel comfortable taking risks and tackling problems that might seem impossible. And as CEO, or anyone on the leadership team, we should be willing to scrub floors, rack and stack servers, or sit down for six hours to whiteboard a really intricate engineering solution. I feel like this mindset is key to a startup in general and is something that tends to get lost as companies grow. At the heart of it all is solving problems—keeping that core engineering drive throughout the company.

Anything else you’d like people to know about what the work you’re doing at Positron?

First and foremost, we want to make sure that the future of machine learning and generative AI is cost-effective and doesn't become a huge strain on the planet's resources. That's really why, beyond just trying to build a business and make money, we fundamentally see that the path of just spending billions of dollars and tens to hundreds of gigawatts on Nvidia-based hardware is not sustainable for the world.

We want to be a provider for everyone, but especially for the companies that are being left to the side when the major hyper-scalers are the ones getting all the GPU allocation. As dystopian as it sounds, imagining a world full of data centers with no space for humans to live is concerning. But it’s almost as bad if those data centers are controlled solely by companies like Microsoft, Google, Meta, etc. We want to make sure that small companies still have the capability to do machine learning, development, inference, and more, without being left out as the world continues down this path.

Conclusion

To stay up to date on the latest with Positron, learn more about them here.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.

Positron is pushing the boundaries of AI hardware 🔋

Plus: Founder/CEO Thomas on the technical decisions that sets them apart...

CV Deep Dive

Our Chat with Thomas 💬

Conclusion

Join Slack | All Events | Jobs