• Cerebral Valley
  • Posts
  • Our chat with Groq's Chief Evangelist, Mark Heaps (Pt. 2)

Our chat with Groq's Chief Evangelist, Mark Heaps (Pt. 2)

Mark on the AI stack, multimodal and Groq's team culture...

CV Deep Dive

Welcome to Part 2 of our chat with Mark Heaps of Groq.

Today, Mark walks us through his thoughts on Groq’s role in the AI stack, the rise of multimodal, and what makes their team special. Check out Part 1 here.

Groq is a startup at the center of the AI-driven chip revolution, and is the brainchild of Jonathan Ross, who was previously a lead inventor of Google’s TPU (Tensor Processing Unit). Founded in 2016, Groq’s stated mission is to ‘set the standard for AI inference speed’ via its proprietary LPU (Language Processing Unit), a new type of end-to-end processing unit system that provides the fastest inference' for computationally intensive applications such as LLMs. 

Just two weeks ago, Groq went viral on X for its demos of GroqChat - a consumer-facing application that allows users to query any model via a text input, similar in UI to ChatGPT. Almost overnight, thousands of users began posting snippets of Groq’s blazingly-fast output speeds - seemingly far quicker than anything previously seen via alternative LLMs products. In just a few weeks, Groq’s user numbers have shot up from under 10,000 developers to over 370,000 in a single week

Today, Groq’s API waitlist numbers in the tens of thousands, and the team is fielding strong demand from Fortune 50s and individual developers alike - all clamoring to incorporate Groq into their own AI applications, as the global race for compute heats up. The startup has received backing from Chamath Palihapitiya’s Social Capital, Tiger Global and D1 Capital, amongst others.

Let’s dive in ⚡️

Read time: 8 mins

Our Chat with Mark (Pt. 2) 💬

Mark - earlier, you mentioned exploring fine-tuning and RAG capabilities within Groq. Are there plans for Groq to serve developers and enterprises across the entire AI stack?  

This is almost a very obvious thing for us to do - I don't think it's a matter of saying if we'll do it, but more a matter of when. I also think the market will dictate this for us. Right now, we're heavily focused on partnering with the right companies for services like fine-tuning and RAG - we want to bring these companies into the fold and learn about what they're doing

A unique part about Groq is we don't store any of our users’ conversations - we actually haven't built this capability into the system. While companies like OpenAI have an incentive to save your conversations so that they can use the data for training, research and evaluations, we actually want all of that to be in the user's control. If you want to save your chat history, you will be able to turn it on, or connect it to your own storage service. We want you to own all that data because we're doing inference, not training, and so there's little incentive for us to have any of that data

So, that’s an example of a service where we looked at it and asked ourselves “are we ever going to do training?” Maybe one day, but there are companies out there today that do it really well. We are working on a fine-tuning solution, as well as some of these other RAG solutions and LoRA’s, but mostly in exploration. As more customers sign up, the user base will dictate where their needs are. The great thing is, because it's so easy to get your models onto the Groq systems, we can support you while you are working with a fine-tuning vendor, until that's something that we potentially offer down the line

The demo that has gone viral is known as GroqChat. Do you have plans to spin that out into an actual consumer application like ChatGPT, or is it purely for demo purposes? 

First and foremost, GroqChat is a user-facing tool for people to experience Groq’s speed. We realize it serves what most people use OpenAI or other services for, and we have the ability to benefit from the open source community iterating on models - which in essence gives you a mixture of experts or multimodality approach, rather than OpenAI iterating on what they have every generation. That said, where I think we really want to focus right now is giving access to GroqChat to as many developers as possible, because that goes back to Jonathan's founding vision. 

The second question is, how do we integrate GroqChat into existing services? We’ve already integrated into poe.com, where we're the preferred provider for Mistral and Llama. We're also looking into other integrations - for example, how would you integrate us into Slack, or a number of other large applications? In some cases, we’re talking to some very large software companies to explore making Groq a tool within their industry standard software. That said, we’re not planning to have consumers pay $20 a month for Groq Chat at this time.

Multimodal models have made great strides recently, as seen by releases from OpenAI and Google. Does Groq have plans to integrate multimodal into its existing offering? 

As far as multi-modal, we have a three pronged strategy for this that we're working on

The first piece was to demonstrate Groq’s performance language to the world. Again, thanks to the open source community, we can now do that. The second of the three pieces is actually audio - we recognize that most developers are playing with speech-to-text and text-to-speech right now, using models like Whisper and platforms like Eleven Labs, Play.ht and Deepgram. So, we're now compiling those models with the goal of developers being able to make API calls on Groq from those models and more. This is a focus of ours right now, and we have a couple of customers that are actually testing their projects with these. 

The third area of focus is getting Groq into the image side of things. To be clear, if you went to our YouTube channel and watched some of our previous Groq Day events, you’d see that we were actually reaching a 30x performance improvement 2 years ago when we were running Yolo as a computer vision model and doing detection. That was exciting because it showed our ability to process streaming information that was visual

The challenge with image has always been latency - for example, a product manager from Adobe suggested we use a style GAN, and at first, it took them around 7 seconds to output at a 512 resolution. We managed to take 8 of those stylings, put one per chip in a single node, and we found we could upload a 1024 resolution, 2x resolution image, and it would apply all 8 styles in parallel at 1024 resolution in 0.186 seconds. So, we went from taking 7 seconds to a fifth of a second - and that was another element of the three-pronged strategy where we thought “let's show that we can do image applications, even if we're not a GPU”. It's simply that it would have to be a model that processes the data linearly, and that's again where we're advantaged

So, we've got a lot of multimodality work going on right now, and our goal in the next quarter is to get that into our surface application where people are experiencing the speed. From there, we’ll continue to roll that out to the developer community.

Jonathan originally comes from a non-academic background but Groq is composed of many research teams. How are you balancing the need for research innovation, with the need for commercial product releases? Is there a shift in thinking on this internally given the past few weeks? 

This has been one of the most dramatic areas of evolution for us internally. Ours is a company filled with PHDs, and in the early days, it was the academic community that validated our research and engineering and the science behind what we’ve been building. Over time, you can mature into a business that’s truly productizing and providing something that solves problems for individuals and enterprises at scale. Internally, this makes you realize you don't need to show every chart and every table - what users care about most is user experience and reliability, and that shifts the culture of your thinking.

One approach takes a more formal academic outlook, while the other one is a true experimental mindset. That said, the core philosophy that runs deep within the company is ‘make it real’ - it's these three words that unite the academic and the commercial side of things. Obviously, research is a huge component of what we do, but now we can also lean on our relationships with the national labs and the developer community versus needing to do it all ourselves, which was where we were a few years ago.

Now, what we want to do is understand which business problems our technology is best suited to solve. This is really what’s key to driving us forward, we want to solve problems.

Share a little bit about the team culture at Groq. What makes this team unique, and what do you look for in prospective team members? 

We're roughly 200 people. We're actually a relatively small company, if you compared us to other accelerator startups. It’s also interesting saying ‘startup’ when we’ve been around since 2016, but I think that's the badge you wear until you've started taking on massive numbers of customers. Culture-wise, Jonathan has always made us focus on talent density, and we have three pillars that we focus on at the company

The first pillar is Team, and we have a commitment from the executive level that says “we will build teams that intellectually stimulate and challenge each other”.So we want everyone to know that when you join Groq, our commitment is to surround you with the best, and that has been true since the day that I walked through the door. 

The second pillar is Growth. A lot of companies think about growing the company, but our perspective is different - what we ask ourselves is how do we grow each individual? When I took over interviewing for culture, Jonathan told me to ask myself with each candidate, “can we help this person grow two dimensions in their life - whether that's a technical skill, a personal skill, a soft skill, can we help this person grow two times?” And we need to make the commitment to do that for those people

The third pillar is Innovation. We value first principles thinking, and are very much not into accepting the blueprint of things. We have an ethos internally called ‘defy gravity’, which we print on our t-shirts and in other places. What that means is, we all gain experience and become experts in our fields working at large incumbents, and then when you come to join a new company, you'll hear people bring their old habits and company cultures with them. Jonathan would say “let’s think about what we want to do here - don't just accept the gravity of that experience. Bring the knowledge from your past, but decide what we should do based on the here and now.” 

And so those are kind of the three pillars around the culture. It's also a very open culture - we operate with what we call the ‘spirit of rough drafts’. Everybody should feel safe to share their rough drafts often and early. These are challenging times, though - we’re being tested right now with the amount of interest and the amount of load. Many engineers, literally since the virality moment happened, have been working around the clock. Some of them are on call to watch the systems, and we're very proud of the fact the systems haven't gone down.

It's proof that our engineering is just filled with amazing people - but this is a great pressure cooker test for us right now, and so far, everyone's handling it wonderfully.

Tell us a little bit about Jonathan himself. What is it like to work with him every day, and how has his unique approach and background shaped the way Groq operates? 

One of the ways Jonathan really influences us all is that he truly keeps a pulse on the industry - he's out there speaking to other industry leaders every day. He’s also really adaptive to change, whereas a large incumbent has to operate like a cruise ship - once they're going, it's hard to slow down and steer. I’d say Jonathan tries to operate the company much more like a speedboat - if you see something in the way up ahead, turn quickly! So he will very much take projects and programs and say “hey, I just learned this, we're going to research it real quick but I think there's something here. Let's change”

Mistral and Mixtral was one of those moments - we had LLama, and were experimenting with Vicuna and Falcon 180b, and then he said “hey, this Mistral thing, everybody's talking about it, this is going to be huge”. He started calling everybody saying “these guys are going to be huge!” and the next thing we know, they were. But internally, we already had it compiled. So we try to keep him out in the front listening and wayfinding while understanding that he's also the person that challenges us. 

Going back to those culture values, he's truly a first principles type of leader who influences the decisions that happen at Groq. For example, a lot of experts suggested not using SRAM, as it’s expensive as a component of your design. Jonathan was convinced that it would absolutely be a key ingredient in being able to accomplish this deliverable that he saw the industry required. Among other things, he’s really good at not letting you be comfortable

Lastly, what are Groq’s priorities for the next 6-12 months? What are you hoping to achieve by the end of 2024?

I’d say the first priority is very clear. We're building a token factory, of sorts, and I think everyone understands that the demand is clear - even Jensen just came out and said that 40% of NVIDIA’s AI revenue is inference! So, we're working to build a ‘token factory’ that meets a huge slice of that demand and creates more capacity for all of these amazing developers and open source communities out there. That's priority number one - how do we get everybody over onto Groq and having this experience? 

The second priority is just continuing to evolve and mature our product - for example, what can we do with the single API, multiple APIs, or in the cloud? And the third priority is continuing to partner with all of the model developers. Until this moment, we've been aggregating, but we've proven that we've got something that empowers the model developers to call us and say “if we're going to build our next model, what should we know about Groq that could make it even better?” I think that's going to be the third stage of our focus - asking what if developers were building for Groq at the start, versus looking into how to switch from GPUs to Groq down the line

Lastly, priorities aside, I would just invite everyone to be curious. We see a lot of the AI community talking about Groq as if we're a GPU company and evaluating us in the way that GPUs have always been evaluated. We've got a lot of documents we're going to be publishing in the next few weeks that explain our power economics, and so on and so forth - but there are already a lot of docs on our website that folks can use to learn about why LPUs are advantaged. I invite people to stay curious!

Conclusion

Stay tuned for Part 2! To stay up to date on the latest with Groq, follow them on X(@GroqInc)and sign up at Groq.com.

Read our past few Deep Dives below:

If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.