No Priors Ep. 143 | With ElevenLabs Co-Founder Mati Staniszewski
- 01The Convergence of Research, Product, and Ecosystem as Sequential Moats
- 02Voice as the Universal Interface Unlocking Multimodal Interaction
- 03The Shift from Reactive to Proactive AI Experiences
1. Key Themes
The Convergence of Research, Product, and Ecosystem as Sequential Moats
The most defensible strategy in AI isn't infinite technical advantage, but rather a sequenced approach where research provides a head start (6-12 months), product development captures that advantage, and ecosystem effects create lasting value. "Research all it is is a head start and being able to accelerate the future closer... research is head start, this gives us advantage to the customer earlier and it's six 12 months of advantage that is also a way for us to build a right product layer" [00:33:37]. Mati explains the company's strategic framework: "research product ecosystem that we built and research all it is is a is a head start" [00:33:20]. The ecosystem includes brand, distribution, voice collections, integrations, and workflows that compound over time even as pure model advantages erode.
Voice as the Universal Interface Unlocking Multimodal Interaction
Voice represents the most natural human interface and will become the primary way we interact with technology across all contexts. The initial insight came from Poland's terrible dubbing experience: "if you watch a movie in Polish language, a foreign movie in Polish language, all the voices, whether it's a male voice or a female voice are narrated with one single character... That's a terrible experience" [00:03:57]. This evolved into a broader vision: "most of us seeing this technological evolution over the last of our last decades, but you still will spend most of your time on the keyboard. You will look at the screen, and that interface feels broken. It should be where you can communicate with the devices through speech" [00:06:10]. The company sees voice not just for static content but for real-time interaction with devices, robots, and ambient computing.
The Shift from Reactive to Proactive AI Experiences
Customer-facing AI is evolving beyond reactive support to become proactive assistants embedded throughout the entire user journey. Mati describes the evolution with Meesho, India's largest e-commerce platform: "they started working on the customer support side where I want to refund. I want to see the tracking of the package to actually having an agent be a front part of the experience... you can ask it, hey, can you help me navigate to item X item Y or can you explain what's the right thing for me to give up for a gift" [00:18:52]. This pattern extends across industries—from Square enabling voice ordering to become full discovery experiences, to interactive education replacing static lessons.
2. Contrarian Perspectives
Audio Models Don't Require Massive Scale—They Need Architectural Breakthroughs
Counter to conventional wisdom that AI requires massive compute and training scale, audio models primarily need talented researchers making architectural innovations. "The main part that I think is different in audio space is that you don't need the scale as much as you need the architectural breakthroughs the model breakthroughs to really make a dent" [00:28:52]. Mati estimates there are only "maybe 50 to 100 researchers in audio space that could do it we think we have probably 10 of them in the company" [00:29:16]. This explains why 11 Labs beats larger labs on benchmarks despite fewer resources—it's about having the right people obsessively focused on the problem, not throwing compute at it.
Voice Quality is Subjectively Unmeasurable—Creating a Data Labeling Problem
Unlike text or image AI where benchmarks are relatively well-established, audio quality is highly subjective and voice-dependent, making traditional evaluation frameworks inadequate. "You have so you're the benchmarks, you have like how do I find the right voice for my audience, but even the understanding of how you describe audio data is still lagging in the industry" [00:17:13]. When they sought data labeling help, "most people just weren't able to do that work effectively because you kind of need to hear and have like a little bit of a skill set of like how would I describe this specific delivery" [00:17:29]. Even switching voices in benchmarks dramatically changes perception: "Just switching the voice makes that excited" [00:16:40], making objective quality measurement nearly impossible.
Building Both Consumer and Enterprise Simultaneously is the Right Strategy
While conventional startup wisdom says focus on one customer segment, 11 Labs deliberately built both self-serve creative tools and enterprise agent platforms in parallel. The company is "at 300 million in ARR, which is roughly 50-50 between self-serve... and approaching 50% on the enterprise side" [00:02:18]. This works because they organize around problem-specific "labs"—combining researchers, engineers, and operators around distinct use cases. "We started with effectively a voice lab... roughly five people... And then we move to the next problem... started then the second team, which was a second lab, an agent lab" [00:08:44]. Each lab can move independently while sharing foundational models, allowing them to capture multiple markets as they emerge.
3. Companies Identified
Epic Games
Description: Major gaming company behind Fortnite
Why mentioned: Exemplary implementation of voice AI in gaming, bringing interactive characters to life
Quote: "We worked with them on bringing the voice of Darth Vader into Fortnite where millions of players could interact with Darth Vader's life in the game where you had like a full experience of Darth Vader in a new way" [00:20:16]
Meesho
Description: India's largest e-commerce platform
Why mentioned: Leading example of proactive AI agents transforming from reactive support to front-of-experience discovery
Quote: "We work with the biggest e-commerce shop in India, Misho, where they started working on the customer support side... to actually having an agent be a front part of the experience. So if you go to the website, you can you have the widget you can engage it for voice and you can ask it, hey, can you help me navigate to item X item Y" [00:18:52]
Chess.com
Description: Leading online chess platform
Why mentioned: Innovative use of voice for personalized education and training
Quote: "We recently worked with chess.com and I'm a huge fan of chess... you can learn chess but you can have Ikaro Nakamura or Magnus Carlsen be your teacher of how you deliver that which is amazing or even bought the sisters" [00:21:01]
MasterClass
Description: Online education platform featuring celebrity instructors
Why mentioned: Pioneering shift from static to interactive learning experiences
Quote: "Master class who we work with to shift from you can of course have the home then we go through step by step. But you can also have like an interactive experience... working with Chris Boss, the FBI negotiator, one of the top negotiators... you can actually call him and have a practice negotiation which is crazy" [00:21:23]
Square
Description: Payment and commerce platform
Why mentioned: Enabling voice ordering and discovery experiences for businesses
Quote: "We can't kick to our work with Square that enables all the businesses to do that work exactly the same pattern. Started with voice ordering. How can now this be part of the full discovery experience too where you get items shown to you" [00:19:33]
4. People Identified
Mati's Co-founder (not named but extensively referenced)
Description: Chief Research Officer at 11 Labs, formerly at Google, known Mati for 15 years
Why mentioned: Described as "the smartest person I know" who built the foundational audio models that differentiated the company
Quote: "I'm in a lucky position that my co-founder and I know, for 15 years, I think he's the smartest person I know, and has been able to create a little of that research work to be able to create that foundation to then elevate that experience" [00:03:42]
Chris Voss
Description: Former FBI hostage negotiator, MasterClass instructor
Why mentioned: Example of how interactive AI transforms static educational content
Quote: "Working with Chris Boss, the FBI negotiator, one of the top negotiators... you can actually call him and have a practice negotiation which is crazy" [00:21:23]
Andrej Karpathy
Description: AI researcher and thought leader
Why mentioned: Referenced for his prediction about the coming era of AI agents
Quote: "As Karpathy says decade of agents... then you'll have a decade of robots" [00:39:43]
5. Operating Insights
The Three-Month Rule for Research vs. Product Investment
When deciding whether to wait for research breakthroughs or build product workarounds, use a three-month threshold to balance innovation with customer delivery. "Rough rule of time is like three months if we think it's going to be longer than three months we probably build it if it's less than that we probably wait" [00:34:17]. This prevents product teams from getting perpetually blocked by research timelines while still allowing them to benefit from imminent breakthroughs. The company lets "product teams the research initiatives so we can paralyze that work but we don't hold them that if a product team thinks we should deliver value to the customer by doing something different they can" [00:34:08].
The Lab Model: Small Cross-Functional Teams Around Problem Domains
Instead of organizing by function (research, engineering, product), organize around specific problem domains with integrated teams. "The way we are organized internally... was looking at the first problem and then creating effectively a lab around that problem, which is like a combination of researchers, engineers, operators to go after that problem" [00:08:16]. They started with a voice lab for narration (~5 people), then added an agents lab for interactive experiences, then a music lab in response to customer demand. This structure allows each lab to own the full stack from research to production while sharing foundational models.
Deploy Forward Engineers Like Palantir for Enterprise Success
For complex enterprise deployments, embed engineering resources with customers rather than just selling platform access. When asked about competing with consulting firms and use-case specific vendors, Mati explained: "My past is also in Palantir... we do blend a lot of the forward employed engineering inside of the company too" [00:24:28]. This works when customers want to "deploy that across a plethora of different experiences... Then it's a great platform to build and then we effectively integrate with customers combined at platform work with our engineering resources to help this company's deploy" [00:25:04].
Build Voice Selection as a Service—The "Voice Sommelier"
For enterprise customers who don't know how to evaluate or choose voice quality, create a specialized role to guide them. "We have like a voice sommelier here effectively... a voice coach has an incredible voice themselves. And now we have like a team under that person that like will partner to help you find what's the right branding" [00:13:44]. This addresses the fundamental problem that "so much of whether you like or not the speech depends on the voice" [00:16:23], and customers often don't know how to articulate their needs beyond vague descriptions.
6. Overlooked Insights
Ukraine's Ministry of Transformation as a Government AI Template
Buried in the conversation was a remarkable example that went largely unexplored: Ukraine is building the first comprehensive AI agent for government operations. "Recently I went to Ukraine where we are working with ministry of transformation where they are effectively creating a first agent in government... they want to re-change of how they run all the ministries" [00:22:00]. What makes this extraordinary is their organizational model: "They have the digital transformation piece. But they have engineering leaders in each of the ministries that lead those efforts and then bring them back to that one central piece" [00:23:04]. This distributed-but-coordinated approach—embedding technical leadership in each ministry while maintaining central coordination—could be a blueprint for how governments worldwide implement AI, yet it received minimal discussion time.
The Long Island Voice Accent Request Reveals Hyper-Personalization Demand
An offhand anecdote reveals a much deeper trend about voice personalization that wasn't fully explored. "Yesterday we had a dinner with some of our partners and one of them the first thing they said is like, hey, I have a new request for you. I want a New York voice with a long Island voice accent, which I never knew as a thing" [00:16:02]. This seemingly trivial request actually signals that enterprise customers are already thinking beyond basic voice quality to hyper-specific regional, cultural, and contextual voice matching. If customers are requesting Long Island accents specifically, the market for voice personalization is far more granular and valuable than commonly understood—suggesting voice will become as personalized as visual branding is today.