- Cerebral Valley
- Posts
- Positron is pushing the boundaries of AI hardware đ
Positron is pushing the boundaries of AI hardware đ
Plus: Founder/CEO Thomas on the technical decisions that sets them apart...
CV Deep Dive
Today, weâre talking with Thomas Sohmers, Founder and CEO of Positron.
Positron is building hardware to accelerate transformer models, with a focus on making large-scale AI computation more efficient and cost-effective. Thomas, who has a long history in the semiconductor industry, founded Positron after seeing the growing demand for transformer-based models like those used in large language models. With a decade of experience and a Thiel Fellowship under his belt, Thomas has worked on AI-specific hardware for companies such as Lambda and Groq before deciding to launch Positron.
Positron has just started shipping its first product, which is designed to provide significant performance and efficiency gains over Nvidia's GPUs, the leader in AI computation. By leveraging FPGA chips, Positron's hardware offers a 3-4x advantage in both performance per dollar and performance per watt compared to traditional GPU-based systems. The company is targeting customers who need the power of large language models but donât want to rely on external cloud services, offering on-premises solutions that prioritize both security and performance.
In this conversation, Thomas shares Positronâs journey from idea to shipping hardware, the technical decisions that set them apart, and whatâs next as they continue to push the boundaries of AI hardware.
Letâs dive in âĄď¸
Read time: 8 mins
Our Chat with Thomas đŹ
Thomas - welcome to Cerebral Valley! First off, give us a bit about your background and what led you to found Positron?
Hey there! I'm Thomas Sohmers, founder and CEO of Positron. I've been working in the semiconductor industry, building chips for a little over a decade now. I received a Thiel Fellowship back in 2013, which allowed me to move out to the Bay Area and start my first company, REX Computing, where I designed and built processors for mobile base stations and high performance computing starting when I was 17.
After that, I worked on some cryptocurrency ASICs, then joined a friendâs startup called Lambda as principal hardware architect, where I built out the Lambda GPU cloud, which is now one of the largest pure-play GPU clouds. That was really my first deep dive into AI compute, specifically using Nvidia GPUs. After that, I joined Groq as Director of Technology Strategy, and in the spring of last year, I decided to start Positron after seeing the explosive growth of generative AI workloads and specifically transformer computation and realizing that it was distinctly different type of workload that needed specialized compute to power the applications of the future.
Give us a top level overview of Positron - how would you describe your startup to those who are maybe less familiar with you?
Positron is building hardware to accelerate transformer models. We just started shipping our first-generation product, which uses these flexible, programmable chips called FPGAs, or Field Programmable Gate Arrays. Basically, they are a special type of chip that can at boot time be reconfigured into a hardware chip design, effectively allowing you to âemulateâ entirely new chip architectures without having to build custom silicon. Weâve done a lot of custom design work and built out a full systemâa physical hardware appliance that can either be on a customerâs premises or in a cloud environment along with all the software for a truly plug-and-play frictionless way to serve transformer models.
Today, it delivers a 3-4x improvement in performance, performance per dollar, and performance per watt compared to Nvidia GPU-based systems.
Give us a sense of who youâre serving - whoâs finding the most value in what youâre building with Positron?
We're still in the really early stagesâour hardware has been shipping for less than a month. But the customers weâre working with fall into two main categories. The first group consists of customers who want the power and capabilities of large language models but donât want to rely on a cloud service. These are companies with either proprietary models they want to deploy or proprietary data they donât want to share outside their environment. So itâs similar to anyone using OpenAI or another tokenized service provider, but they want the models to run locally, âon-premise.â
The second group are cloud service providers themselves. A lot of them have already deployed large numbers of GPU-based systems, and weâre able to offer them a solution that provides better performance in terms of tokens per second per user while also being more cost-effective to operate and deploy.
Inference has become a huge part of AI investment in the last two years. Could you give us an understanding of how Positronâs hardware differs from some of the other players in the market?
This demo shows a head-to-head matchup between two of our Positron accelerator cards and two NVIDIA SXM5 H100 modules running Metaâs Llama 3.1 8B model. It is designed to be as close to an apples-to-apples comparison with the exact same hyper-parameters, quantization, operation precision, etc. and both running the MMLU benchmark. You can see here that weâre performing at around 270 tokens per second per user, while the Nvidia system is hovering around 160 tokens per second per user. You can also see that we are using about 1/3rd the power per token. With our systems shipping today being at less than half the cost of an NVIDIA DGX H100, it means we can enable the inference of Llama and other transformer models with a 3 to 4x performance per dollar and performance per watt advantage while delivering a noticeable throughput advantage.
The key takeaway here isnât just the speed and performance for the end-user query but for our customersâthe providers themselves. Whether itâs token-as-a-service providers or companies looking to leverage language models on their own proprietary data, weâre able to deliver the same capabilities as Nvidia but at a fraction of the cost and power, and often faster.
Hardware for the AI revolution has long been respected as a challenging area to pursue. What has been the hardest technical challenge youâve faced so far while building Positron?
Weâre really proud that, in less than 18 months from starting, weâve been able to get to the point where weâre shipping production hardware. Thatâs pretty much unheard of for hardware startups. The key driver behind this is that, unlike most other AI chip or hardware companies, weâre leveraging FPGAs (Field Programmable Gate Arrays) as the execution engine in the solution weâre building and delivering today.
FPGAs have allowed us to get to market much faster and with lower NRE (non-recurring engineering) costs. While FPGAs may not be as efficient or performant as ASICs, using them means we can quickly iterate on designs and get hardware into customers' hands early in the process. Most companies take three to five years before they can ship a product, but by the time their chips are ready, the market may have already moved on.
We felt it was far more important to iterate quickly and learn from real customer use, rather than spending years on building a chip that might be outdated by the time it launches. That approach has allowed us to move faster with fewer resources than almost any other AI hardware company out there. But yeah, we havenât slept much in the last year and a half!
Thereâs been an explosion of interest in agentic workflows and multi-modal AI. How has that shaped the way youâre thinking about building Positron?
Weâve really focused on the layer above just the hardware, recognizing that transformer models are the future. Right now, large language models are the most pervasive, with ChatGPT being a major application using transformers, but our technology is built to support more than just that. We can handle image transformers, video transformers, and other multimodal applications. This means we can support a range of tasks, whether itâs analysis or generation, across different types of data. Fundamentally, we think the limiter for the deployment of these applications is the economics of scaled inference, and Positron can provide a 4x performance per dollar advantage today and more in the future.
How do you plan on Positron progressing over the next 6-12 months? And is partnering with data centers a strategy on the horizon?
We decided to start shipping because we hit the minimum feature set and stability that both we and our early customers agreed on. Even though the performance I showed earlier, which exceeds Nvidia, is impressive, it's not the endpoint for us with this hardware. Over the next six to nine months, we expect to more than double the current performance and add significant new capabilities, like speculation and paged attentionâthings that many people take for granted on existing platforms, but we're not even doing yet. These features will come through software updates, and of course, being a hardware company, we have a product roadmap that includes our next generation product, which is already in development beyond our current FPGA-based devices.
Key to our business plan is partnering with existing cloud providers, data centers, and the like, rather than trying to compete with them. Some of the other companies Iâve referenced end up deploying their own hardware because they donât have compelling solutions to actually sell it, and as a result, they end up spending a massive amount of capex and effortâand theyâre losing money doing it.
For us, weâre selling based on the merits of our hardware and the value it generates for those who purchase it. We do have our own hardware deployed in our office to support try-before-you-buy, giving people the chance to experience Positron without making a huge upfront investment. But ultimately, we donât want to compete against companies that are much better equipped than us to build out and provide services on top of our hardware.
What do you think Positronâs competitive edge is in the space youâre operating in? What are you doing differently from an architecture perspective? Is it using FPGAs?
The key differentiator between us and both the incumbent GPU architecture, as well as most of the other AI chip startups is our primary focus on the memory system. Transformers, unlike the convolutional neural networks (CNNs) that were the main focus of AI silicon for the past decade, are not simply accelerated by throwing more FLOPs at the problem, and are instead heavily memory bound during inference. Our magic is in our ability to actually achieve nearly perfect utilization of our external memory bandwidth, while GPU architectures are typically only able to achieve 10% to 30% of their theoretical memory bandwidth, which directly impacts raw performance and efficiency. We can also leverage multiple external memory technologies directly attached to our devices simultaneously, being able to get the speed advantages of HBM while also having the capacity advantages of DDR5, enabling massive context lengths, concurrent users, and scalability to the multi trillion parameter model sizes of the future.
Weâre not stuck with FPGAs, and our next-generation product is going to be an ASIC. A key thing about our ASIC design, without going into any of the technical details, is that our ASIC is being designed with the heritage and understanding of real customer needs, and us having delivered a product that actually solves customers' problems and is at a price and performance point that is better than Nvidia. If you look at a lot of other companies out there that already have ASICs and have started to build services and things around it, fundamentally, the companies that are providing cloud services are doing it as a way to hide the fact that their silicon actually isnât economical and doesnât have any real advantages over Nvidia on a chip-to-chip comparison.
The key thing for us is driving performance per dollar and performance per watt. So, as we use all the tricks up our sleeves to improve our performance, performance per dollar, etc., weâre still staying true to what customers actually demand, which is having a real economic basis for their purchase decision. While if youâre using one of the other AI chips out there that claim to be competing against Nvidia, fundamentally, they donât have a true economic advantage, or they would actually be selling, and they would have had luck when they were trying to sell the products that they had noted.
Lastly, tell us a little bit about the team and culture at Positron. How did you bring the original group together, and what do you look for in prospective team members that are joining?
The vast majority of our team has had the great fortune of working at some of the best companies in their fieldsâwhether it's in CPU or FPGA design, manufacturing, networking, and so on. But a really key thing is that regardless of everyone having reached high positions and taken on strategic responsibilities, at their core, everyone is still a real engineer whoâs ready to get their hands dirty for any task that's necessary. Weâre super proud of our combined heritage, with our 21 employees having over 500 years of chip and system design experience. Even though our collective past experience has been in shipping billions of dollars and dozens of generations of high volume products, egos are checked at the door and everyone is working tirelessly to bring Positronâs technology to the world.
One of the core values we started the company with, and something I hope we can carry on forever, is that no job is above or below anyone. Even the most junior person should feel comfortable taking risks and tackling problems that might seem impossible. And as CEO, or anyone on the leadership team, we should be willing to scrub floors, rack and stack servers, or sit down for six hours to whiteboard a really intricate engineering solution. I feel like this mindset is key to a startup in general and is something that tends to get lost as companies grow. At the heart of it all is solving problemsâkeeping that core engineering drive throughout the company.
Anything else youâd like people to know about what the work youâre doing at Positron?
First and foremost, we want to make sure that the future of machine learning and generative AI is cost-effective and doesn't become a huge strain on the planet's resources. That's really why, beyond just trying to build a business and make money, we fundamentally see that the path of just spending billions of dollars and tens to hundreds of gigawatts on Nvidia-based hardware is not sustainable for the world.
We want to be a provider for everyone, but especially for the companies that are being left to the side when the major hyper-scalers are the ones getting all the GPU allocation. As dystopian as it sounds, imagining a world full of data centers with no space for humans to live is concerning. But itâs almost as bad if those data centers are controlled solely by companies like Microsoft, Google, Meta, etc. We want to make sure that small companies still have the capability to do machine learning, development, inference, and more, without being left out as the world continues down this path.
Conclusion
To stay up to date on the latest with Positron, learn more about them here.
Read our past few Deep Dives below:
If you would like us to âDeep Diveâ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.