- Cerebral Valley
- Posts
- Apify is Building the Infrastructure for AI's Data Problem 🕷️
Apify is Building the Infrastructure for AI's Data Problem 🕷️
Plus: Founder and CEO, Jan Curn, on why tool marketplaces, RAG pipelines, and containerized data extraction will be the most critical foundations for scalable AI applications...

CV Deep Dive
Apify is the world's largest marketplace of tools for web scraping, data extraction, and browser automation. Founded by Jan and his co-founder Jakub Balada in 2015 and built on containerized infrastructure, it provides more than 6,000 Actors that can extract data from social media, e-commerce sites, search engines, online maps, or any other website. Apify's goal is to enable organizations to automate workflows and populate them with web data, particularly for AI applications that need external context beyond what's available in language models.
Today, Apify has seen adoption across industries from Fortune 500 companies to independent developers, with customers using it for everything from competitive intelligence to populating RAG pipelines and vector databases. Its fully integrated platform, including compute infrastructure, proxies, and marketplace with monetization, makes it a standout solution for businesses needing reliable web data at scale. The company has reached $25 million ARR with more than 15,000 customers worldwide, largely through bootstrapped growth.
In this conversation, Jan shares how Apify was founded, the critical role of web scraping in modern AI applications, and their vision for scaling through a developer marketplace that's approaching half a million dollars in monthly payouts.
Let’s dive in ⚡️
Read time: 8 mins
Our Chat with Jan 💬
Jan, welcome to Cerebral Valley! First off, introduce yourself and give us a bit of background on yourself and Apify. What led you to co-found Apify back in 2015?
Hi there! My name is Jan Curn and I'm the founder and CEO of Apify. Apify is the world's largest marketplace of tools for web scraping, data extraction, and browser automation. We started Apify back in 2015. We went through the Y Combinator Fellowship together with my co-founder, Jakub, and we've been building Apify ever since. We raised a little capital, but not much.
A big part of our journey has been bootstrapping. Now, for context, we’re at $25 million ARR with a team of about 160 people. We have more than 15,000 customers and 40,000 active developers around the world using the platform. It's been a long and interesting journey!
How would you describe Apify to the uninitiated developer or AI team?
We provide a marketplace of more than 6,000 tools we call Actors for all kinds of use cases. For example, there are Actors that can extract data from social media for marketing analytics or sentiment analysis. Other Actors let you extract product data from e-commerce sites for competitive intelligence or dynamic pricing. Or there are Actors to fetch data from search engines or online maps for lead generation. Another popular and fast-growing use case of Apify is extracting data for AI, both for training new models or providing AI apps with context fetched from the web.
Google Maps is one of the richest sources of local business data 📍
With Apify Google Maps Scraper, get:
∙ Business contacts
∙ Employee leads
∙ Social media profiles
∙ Ratings & reviews
∙ Custom search areas...and more, all from a single tool. Watch the tutorial to see
— Apify (@apify)
3:27 PM • Aug 26, 2025
For those who haven't worked with web scraping platforms before, what makes this space so critical for modern AI applications?
Without web scraping, there would be no generative AI revolution. All major LLMs were trained on data scraped from the open web. But the models on their own are limited by the knowledge cutoff. To build useful AI apps or agents, you need to provide the models with the right and up-to-date context, which often requires extracting data from the web.
Let’s say you’re building an AI chatbot. To make it useful, you need to give it some documents with up-to-date knowledge, for example, a company’s website, blog, or knowledge base. You can use Apify to crawl such websites and extract this data, store it in a vector database, and connect it to your RAG pipeline or agent. Similarly, you can give the AI chatbot access to current weather forecasts, posts on social media, places from a map, or menus from local restaurants… whatever makes for good context.
Who would you say is finding the most value in what you're building with Apify - are you seeing more traditional developers, AI researchers, or enterprise data teams gravitating toward the platform?
Apify is a horizontal platform with a very diverse customer base. It’s used by Fortune 500 companies for large projects as well as by independent researchers or students who just pay maybe $40 per month.
Most of our customers are small to medium-sized businesses that want to automate workflows and feed those workflows with data. Historically, our users were mainly developers who could connect various APIs, but over time, as no-code automation tools like Zapier, Make, or n8n exploded, Apify’s target audience expanded. Now, with AI agents and vibe coding tools, practically anyone can set up such software systems with Apify.
Big news: Apify’s native integration with @n8n_io is now live in the cloud! 🩷
Run Actors, trigger workflows from scrape events, and push data into 500+ n8n tools with no HTTP requests needed.
Try it now → just type "Apify" in the node bar. Plus link in 🧵
— Apify (@apify)
2:50 PM • Jul 28, 2025
Talk to us about some existing use cases for Apify. With over 6,000 Actors in your store now, are there any interesting customer stories that you'd like to highlight?
One of our customers is Intercom, which built Fin.ai, a leading AI chatbot for customer support. They partnered with Apify to crawl the web to feed their chatbot with customer knowledge. This helped the Fin team focus on building their unique AI chatbot experience rather than wasting time building and scaling web crawlers.
Another example customer is Groupon, which uses Apify’s web data collection to find and reach new merchants, and thus expand their business.
We also work long-term with a US nonprofit product called Spotlight. It’s an AI tool that law enforcement agencies use to find missing children. Apify crawls various classified ads or websites offering escort services, extracting details including images, and feeding them to the Spotlight database. When someone goes missing, the police upload a photo of that person to the database, and AI finds where it has been seen on the web.
Years ago, we heard that Spotlight had identified more than 17,000 victims of child trafficking. I'm proud that our technology can contribute to such a good cause and not only help companies make more revenue.
We would love to walk through your platform. Which use cases should new users and customers experiment with first - would you recommend they start with your pre-built scrapers or jump into building custom Actors?
It's best to start in Apify Store and search for an Actor for your use case. For example, to extract data from TikTok or Amazon. Given that there are 6,000 of these Actors on our marketplace, there is a good chance you will find one that solves your problem.
If you don't find an Actor for your use case, then you can build your own. Apify provides open-source tools like Crawlee, SDKs, templates, and docs to make this as easy as possible. And if your use case is popular, you can publish your Actor on Apify Store and monetize it. Last month, we paid our community creators almost half a million dollars, more than 4x compared to the last year.
There are a number of companies working in the web scraping and data extraction space. What sets Apify apart and how do you differentiate from both traditional scraping tools/newer AI-powered alternatives?
Our most unique characteristic is that Apify is a fully integrated platform where you can find all the tooling you need to run these scrapers. We provide a raw compute and storage infrastructure, proxies fully integrated into the platform, and on top of this, a marketplace with thousands of Actors for all kinds of use cases that are easy to integrate into your code or any external system.
A lot of our competitors focus on just some part of this whole stack. The Apify platform combines all of these end-to-end to give our users far more flexibility to build new things and automate their workflows.
Could you share a little bit about how Apify’s platform actually works under the hood in this AI era? Give us the reasoning behind some of the architectural decisions you’ve made.
We use a container-based infrastructure where each job is isolated from the others to provide reliability and security. Most of the stack is built in TypeScript because we made an early bet on JavaScript at a time when it wasn't the most common choice for backend infrastructure.
Running our own container orchestration platform helped us balance performance and cost. We put years of R&D into building this platform, making sure the containers are fast, reliable, and scalable.
95% of Apify Actor runs start in under 1s ⚡️
How we did it:
▸ 7s → 1.2s median startup
▸ Smarter scheduling
▸ Layered caching
▸ 62% lower IOPS
▸ 5× faster cold startsFull breakdown in 🧵
— Apify (@apify)
6:32 PM • Aug 14, 2025
What would you say has one of the hardest technical challenges around building Apify into the all-out platform it is today?
The fact that you're building software to run other people's software, which is an order of magnitude more challenging than building “just” the software. You don't have control over what they do and which resources they access. We have seen user workloads that created millions of files, opened millions of sockets, or pushed the storage system to extremes.
It's been fairly challenging to ensure those executions are isolated from others and that the system can scale to millions of container runs per day.
How do you foresee the Apify platform evolving over the next 12-18 months? Any product developments that your biggest users should be most excited about?
We're extremely bullish on the marketplace aspect of the Apify platform. As the internet economy grows, there will be ever more room for new niche Actors, be it tools for AI or AI-native tools. Regardless of how such software gets built, people or agents will always need compute, storage, integrations, and monetization to make it easy to discover and use such tools. Our goal for next year is to attract more developers to Apify and show them how they can make money selling their software online.
We’re also working on making it very easy to use Apify and Actors from any automation or AI tool. This means we’re building new integrations, a state-of-the-art MCP server, and working on supporting agentic payment protocols such as Coinbase’s x402, Skyfire, or Cloudflare’s pay-per-crawl.
Lastly, tell us a bit about the team at Apify. How would you describe your culture, and are you hiring? What do you look for in prospective team members joining the company?
Our team is about 160 people, mostly in Europe, with headquarters in Prague. We have an open culture that values flexibility and responsibility. People can work from home or come to the office. We have a really nice loft in the center of Prague with beautiful views and free lunches. So far this year, we organized or participated in maybe 50 different meetups, conferences, or events around the world!
Earlier this year, we launched a San Francisco mission because that's the epicenter of the AI revolution. We want to connect more with the AI and developer community in the Bay Area. Subscribe to our calendar and join us at one of our upcoming events: https://luma.com/apify
Just got back to Europe after a quick but intense trip to San Francisco for the AI Engineer World’s Fair.
My talk about the rise of the agentic economy was featured on the main stage, which was both exciting and slightly terrifying 😅 Fun fact: I submitted it through the
— Jan Čurn (@jancurn)
11:28 AM • Jun 9, 2025
If you’re interested in joining our team and helping people get more value from the web, check out our careers page at https://apify.com/jobs
Conclusion
Stay up to date on the latest with Apify, follow them here.
Read our past few Deep Dives below:
If you would like us to ‘Deep Dive’ a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.