- Cerebral Valley
- Posts
- Vana is championing 'user-owned AI' for the masses š
Vana is championing 'user-owned AI' for the masses š
PLUS: CEO Anna on decentralization, AI alignment and Vana's mission...
CV Deep Dive
Today, weāre talking with Anna Kazlauskas, Co-founder and CEO of Vana.
Vana is a decentralized platform that lets people reclaim their data and make it portable across applications. Founded by Anna and co-founder Art in 2021, Vana is built on the idea of āuser-owned AIā, where users run a node and host their own data on Vana, bringing it with them to different apps, and can even combine their data with others to create collectively owned models. Short for āNirvana' - a play on freeing data - Vana aims to be the foremost digital playground for your AI to build relationships, do economic work and live freely from Big Techās data silos.
Today, Vana has millions of users on its platform, using it for a range of activities from self-exploration to participating in data collectives for building user-owned AI. The startup also went viral last year with people wanting to turn their data into AI clones for use across social media. The startup has raised funding from Paradigm Capital, Polychain Capital, Packy McCormick and more to build the digital future for your AI clones.
Reddit makes a bunch of money selling your work to foundation model companies.
Which is cool. But, like, itās your work.
Share your Reddit data, get ownership in a pool of Reddit data, own the fruits of your posting labor.
ā Packy McCormick (@packyM)
4:42 PM ā¢ Apr 3, 2024
In this conversation, Anna walks us through the founding premise of Vana, why decentralization is a critical component of the AI revolution, and her goals for the next 6-12 months.
Letās dive in ā”ļø
Read time: 12 mins
Our Chat with Anna š¬
Anna - welcome to Cerebral Valley. First off, give us a bit of background on yourself and what led you to start Vana?
Hey! Iām Anna, Founder of Vana. A little bit about myself: Iāve always been super interested in the world of programming and modeling the world with data. I learned to program on a graphing calculator in middle-school, and in high-school I got really into economics and how central banks work. I spent time interning at the Fed before going to MIT, where I got really into decentralization and how it could be used to impact currencies and markets. I later ended up at the World Bank, where I ended up automating a bunch of their document processes with ML - and before I knew it, I was selling this document-sorting software Iād created to government agencies. So, I ended up dropping out of MIT and going through YC in 2017, which is when the Transformers paper came out.
While at MIT, I was taking classes with Regina Barzilay, who was a leader in the NLP space. It was still early days for the generative models weāre seeing today, but I saw that the only thing that matters for these models is data. If we have better data to train them, we get much better models - and so ultimately, the thing that's important is having very high quality data to create models from. If you look at how data is owned today, it mostly sits in the siloes of Big Tech, and so theyāre really positioned to build super-powerful AI. But, there are ways where you can have the same decentralized approach that people took with currencies and apply it to data.
This is what Vana is doing - weāre focussed on the question of: how do we use the tools that have worked really well for decentralizing finance and apply them to data in AI? How can we make a user-owned foundation model, where people are still in control of their own AI and avoid sitting in Facebook-jail because the platform didnāt like what they were saying?
Fast-forward, I met my co-founder Art while doing my undergrad at MIT. He comes from a legal background and was doing grad school at Harvard, and was previously selling data to large companies like Facebook and figuring out how to get people to directly sell their data. So weāre both pretty deep in this world of data ownership!
Launching the worldās first data DAO, focused on Reddit data, on the Vana network: @rdatadao
ā Anna Kazlauskas (@anna_kazlauskas)
4:04 PM ā¢ Apr 3, 2024
Describe Vana to someone who hasnāt heard of it before. What does having personalized AI mean in its fully realized form?
Vana is a decentralized platform that lets people reclaim their data so that it is portable across applications and can fuel the creation of AI built on collective data. We think of it as āuser-owned AIā - comparable to Urbit or Solid Project, which are personal server architectures where you can run a node on your own and host your data, and then bring it with you to different apps. Vana is similar, plus a permissions and incentives layer that makes it so that even if most users don't want to self-host, they still have a way to interact and maintain that same agency.
One term that you hear a lot in AI is the āalignment problemā - meaning, how do we align AI with human values? But, the reality is that every single human being has different values - so the idea of making one AI that represents all of our values seems literally technically impossible.
From my perspective, everyone should solve their own alignment problem of having their AI exist in a certain way, maybe by giving it a whole bunch of context on yourself and your past via your notes or messages, for example. The overall premise is around customizing AI to have a really deep understanding of yourself, your values, your preferences and your experiences. With this, you could unlock completely new applications over the next 3-5 years - for example, something as fun as watching a Netflix show about you and your friends, all the way to having a very intense debate with an AI version of yourself.
With AI, it really matters how these models are trained, and you want to avoid having it be censored in a way that you disagree with or that feels biased. One example of this going wrong is Google Bard, which attempted to rewrite history on top of being blatantly offensive. When AI becomes our source of truth, whoever has that AI shouldnāt be the one who decides what is going to be true for everybody. We think that everyone should choose their own truth and have that control and agency over their own model.
As we start to rely on AI models more and more, they become our source of truth. We shouldn't let a single company control that truth. Google's AI is a recent example of this - do you want extreme wokeness to rewrite history?
AI should be owned and controlled by users.
ā Anna Kazlauskas (@anna_kazlauskas)
4:36 PM ā¢ Mar 8, 2024
Who are Vanaās users today? Whoās finding the most value in what youāre building?
Two years ago, when we were building these tools to help you bring your data across applications, people were like āWhat would I use this for? Why would I care about portability or bringing my data together?ā Now, as weāve started to see generative AI models come to life, that's unlocked a lot of user interest in us. A lot of our users harness their data to create models of themselves - for example, image models are really popular right now, and we also support voice, text and personality.
One unexpected emergent use-case is people from our community pooling their data in collectives for the creation of user-owned foundation models. Just this week, the worldās first Data DAO, r/datadao (www.rdatadao.org) launched on the Vana network. It was incredible to see how Vana users are so mission-aligned. The DAO was built to protest against Reddit selling user data to Google for $60 million a year and has already hit over 20k sign-ups. Experiencing waves of virality has been a huge learning for me in building consumer products. How can you build a tech stack and a team that can scale and rise to these special moments? If you unlock that, then you can capitalize on these unexpected wins.
We want to offer users as much control over their data as they want so we also have an option for users to use our self-hosted option to run Vana from their MacBook - and weāve partnered with Replicate for GPU access. The crowd that does this is definitely more tech-forward and hobbyist - many of the locallama crew, for example, who use it for self understanding, searching their data, and memory type stuff. This feature is still quite early, and we just released that a few weeks ago.
Lastly, we have some really awesome AI consumer devs building on our Vana API, which lets you onboard a userās data and model much more easily than having to onboard them within your own application. One app thatās taken off here is Chirper, which is a social network only for AIs - no human beings allowed. Chirper lets people create fictional characters that live autonomously in their digital world, which fits in nicely with Vana, where weāre focused specifically on exploring the boundaries of agency and freedom with AI.
I'm excited to see emergent use cases emerge for Vana as everyday people become more familiar with the possibilities in decentralized AI and devs become more familiar with the image API, the text API, and the underlying data too. There is a great deal of interest right now in spinning up other data collectives to build user-owned foundation models. We also have some app partnerships in the works in the music space where they just need the voice recording and you can create a rap song of yourself using your voice or your friend's voice. This is going to be super cool.
The current path of society is to allow big tech to take our data and use it to train AI models that do our jobs. The only way to prevent this is through collective action. Data is currency, and collective data is power.
ā Anna Kazlauskas (@anna_kazlauskas)
4:04 PM ā¢ Apr 3, 2024
How do you see Vanaās position in the AI space evolving over the next 6-12 months, given the insane pace of AI breakthroughs happening on a weekly basis? What are you most excited about?
The first thing Iād highlight is around cost, which is such a bottleneck in AI today. If youāre a consumer app that goes viral, congrats - you now have a $50k bill you owe AWS. Of course, the local, self-hosted models enable much more interesting applications because youāre no longer cost-constrained; today, though, so much energy is going towards cost reduction, which is limiting innovation. For example, agent products that require 100 OpenAI API calls for every interaction are going to be way too expensive for individuals to work on themselves. So, Iām definitely excited about dropping the cost to unlock cool applications on Vana.
The other set of applications that I'm excited about are around AI creating economic value - what does it mean when your AI can earn money for you and actually do āworkā? I think an early successful version of this was CarynAI - influencers scaling themselves in NSFW industries, which is often where emerging technologies will find their first use-cases. That said, I think thereās a huge opportunity for people to scale themselves in an interesting way, and have their AI go join the workforce and earn money.
It is great to see data collectives emerge on Vana, signaling a future of user-owned foundation models. These collectives allow you to own a piece of the core technology that powers your AI. It makes it possible to build AI like open source software in a way that benefits everyone who contributes. The technical architecture of data DAOs can be applied to model DAOs, where users and developers contribute data, compute, and research in exchange for ownership and use of the model. Iām excited about collectively owned models, especially as model merging techniques advance to allow a distributed group of users to train large, capable models.
Today, models are primarily trained on the publicly scraped internet. What if 100M users contributed their private data from siloed platforms to create a user-owned foundation model?
ā Anna Kazlauskas (@anna_kazlauskas)
3:34 PM ā¢ Mar 4, 2024
Whatās the hardest technical challenge around what youāre building with Vana?
Some of our biggest questions are: how do you personalize models and make data portable in a way that works across applications? How do you modify the model so that it's usable, but also be able to store some fraction of the amount of information we have in our human brains? What does it mean for my model to be a personal model? Broadly, our biggest challenge is around model personalization.
The other question weāre thinking about is: how can you use model personalization to create better models across many users? The furthest out example of this is making a better user-owned foundation model where you have 100 million people contributing their personal piece of the model, and stacking them together. For example, how do you get 100 of the best psychologists to train an amazing model on their notes and their process and combine it all together? This is hard to do even if you have all the data in the same place, but itās even harder if you want to do it in a distributed way where you're training part of the model on every single person's device, such that you can have a really strong privacy guarantee. That's just a really hard technical problem.
Broadly-speaking, our biggest question is how do we make data portable and easy to use? Weāre working with a super-personal dataset that we have to also keep secure, and we do a lot of client-side encryption to achieve that. How do you get that portability while still having a strong convenience, all while having strong security guarantees?
So important to own the models created from our wisdom, especially as AI starts to do valuable economic work
ā vana (@withvana)
7:50 PM ā¢ Apr 3, 2024
Tell us how you navigate the choppy waters of data privacy and controls, when users are porting over their personal data in order to make the AI clones of themselves realistic?
One thing to note is that we want to be very careful around how we communicate the data security aspects of what we do to the user - because questions around data privacy and security do tend to spook a lot of people. As a product person, you actually want your product to work super seamlessly and then have strong security guarantees.
If you want the strongest guarantee, you can actually run your model locally and then none of your data will ever leave your machine. Your personal model stays secure, as you're running inference locally and then returning the output, and you can use your personal model across the different applications. So, if you want to use an AI dating app that simulates your future with another person, but you also don't want to upload all of your messages and journal entries and the history of all your breakups into a random application, that gives you a strong guarantee.
For the hosted version of the application, we encrypt all of the data and then put the users in control. The other thing I would mention is actually having strong terms of service and our privacy policy that both state that the user owns their data and models - all of it is theirs. I think that's a piece that people often don't think about because they just gloss over it, but you can actually give people very strong guarantees from a legal perspective too, in addition to the technical side of things.
Describe the culture of your team today. Are you hiring, and what do you look for in prospective team members?
Weāre currently a team of 14 people - we like keeping a very small team, and weāre very build-heavy. Culture-wise, Iād say that everyone is very mission-driven - we all believe that users should own their data, and that's somewhat of an ideological foundation for us. Weāre also really excited by the challenge of what weāre working on - a user-owned foundation model trained by 100 million people is not like a YC SaaS company that you sell in six months. This is going to take a while, and I think everyone's aligned towards that.
Lastly, Iād say we try to foster a culture of kindness and love throughout the team, which Iām very grateful for - like, how can you give a little more love to everyone throughout their day? This is super important to us.
Conclusion
To stay up to date on the latest with Vana, follow them on X(@withvana) and learn more at them at Vana.
Read our past few Deep Dives below:
If you would like us to āDeep Diveā a founder, team or product launch, please reply to this email ([email protected]) or DM us on Twitter or LinkedIn.