The Godmother of AI on jobs, robots & why world models are next | Dr. Fei-Fei Li
- 01The Genesis of Modern AI: From ImageNet to Deep Learning
- 02Spatial Intelligence: The Next Frontier Beyond Language Models
- 03AI as a Human-Centered Technology
1. Key Themes
The Genesis of Modern AI: From ImageNet to Deep Learning
Dr. Fei-Fei Li identified a critical missing ingredient in AI development: big data. As she explains, "I think my student and I conjectured that very critically overlooked ingredient of bringing AI to life is big data" [00:16:29]. This insight led to ImageNet, which combined 15 million images across 22,000 concepts. The breakthrough came in 2012 when "a group of Toronto researchers led by Professor Jeff Hinton participated in image that challenge, used the image that big data and two GPUs from Nvidia and created successfully the first neural network algorithm" [00:19:48]. This combination of big data, neural networks, and GPUs became "the golden recipe for modern AI" [00:19:31].
Spatial Intelligence: The Next Frontier Beyond Language Models
Dr. Li argues that spatial intelligence is fundamentally different from and complementary to language models. "Humans are deeply visual animals. We can talk...but so much of our intelligence is built upon visual, perceptual, spatial understanding, not just language per se" [00:15:24]. She illustrates this with a first responder scenario: "If you immerse yourself in a scene and think about how people organize themselves to rescue people, to stop further disasters, to put down fires, a lot of that is movements, is spontaneous understanding of objects worlds, human situation awareness, language is part of that, but a lot of those situations, language cannot get you to put down the fire" [00:32:52].
AI as a Human-Centered Technology
Dr. Li emphasizes personal responsibility in AI development: "I believe that whatever AI does, currently or in the future, is up to us. It's up to the people" [00:06:47]. She stresses that "there's nothing artificial about AI. It's inspired by people, it's created by people and most importantly, it impacts people" [00:08:22]. This philosophy extends to everyone: "Everybody has a role in AI. It depends on what you do and what you want. But no technology should take away human dignity" [01:15:59].
2. Contrarian Perspectives
AI Was Considered a "Dirty Word" Less Than 10 Years Ago
Contrary to today's AI hype, Dr. Li reveals that "in the middle of 2015, middle of 2016, some tech companies avoid using the word AI because they were not sure if AI was a dirty word" [00:21:43]. She actively encouraged companies to use the term, and notes that "2017-ish was the beginning of companies calling themselves AI companies" [00:22:37]. This is remarkable given that today, every company positions itself as an AI company just 7-8 years later.
The "Bitter Lesson" Won't Work Alone for Robotics
While the AI community embraced the "bitter lesson" (that simpler models with tons of data always win), Dr. Li argues robotics is different: "You hope to get actions out of robots. But your training data lacks actions in 3D worlds. And that's what robots have to do, right? Actions in 3D worlds" [00:43:56]. She compares robotics to self-driving cars, noting that self-driving has taken 20 years from Stanford's 2005 DARPA challenge win to today's Waymo, and "self-driving cars are much simpler robots. They're just metal boxes running on 2D surfaces, and the goal is not to touch anything. Robot is 3D things running in 3D world and the goal is to touch things" [00:46:32].
Humans Are More Impressive Than AI
Dr. Li offers a humble perspective from someone at the frontier of AI: "We operate on about 20 watts. It's dimmer than any light bulb in the room I'm in right now. And yet we can do so much. So I think actually the more I work in AI, the more I respect humans" [00:47:49]. This stands in contrast to the common narrative that AI will soon surpass human capabilities across all domains.
Don't Overthink Career Decisions
In an era of optimization and analysis paralysis, Dr. Li advises: "I do find many of the young people today think about every single aspect of an equation when they decide on jobs...sometimes I do want to encourage young people to focus on what's important...where's your passion? Do you align with the mission? Do you believe it have faith in this team?" [01:09:00]. She credits her own success to being "intellectually very fearless" and not overthinking "all possible things that can go wrong, because that's too many" [01:08:31].
3. Companies Identified
World Labs
Description: Frontier AI company focused on spatial intelligence and world modeling, founded by Dr. Fei-Fei Li with co-founders Justin Johnson, Christoph Lassner, and Ben Mildenhall in 2024.
Quote: "We believe that spatial intelligence and world modeling is as important, if not more, to language models and complementary to language models. So we wanted to seize this opportunity to create deep tech research lab that can connect the dots between frontier models with products" [00:49:02].
Product: Just launched Marble (marble.worldlabs.ai), "the world's first generative model that can output genuinely 3D worlds" [00:50:24], which allows users to prompt and create navigable, immersive 3D environments.
Scale AI
Description: Data labeling company that emerged from the ImageNet legacy.
Quote: Dr. Li mentions "Alex Wang from Scale very early days, I probably still has his emails when he was starting scale. He was very kind. He keeps sending me emails about how you match that inspired scale" [00:20:57].
Nvidia
Description: GPU manufacturer that became critical to AI development.
Quote: The first major deep learning breakthrough used "two GPUs from Nvidia" [00:19:06], and today AI training requires vastly more GPU power from the same company.
4. Operating Insights
Start with Intentional Design Features That Delight Users
World Labs added a visualization feature to Marble showing dots before the full 3D world renders. Dr. Li explains: "The dots that lead you into the world was an intentional feature visualization. It is not part of the model...we were trying to find a way to guide people into the world" [00:51:52]. When users found it delightful (like the matrix), it validated that thoughtful product design around core technology matters. This suggests companies should intentionally design transitional experiences rather than just focusing on the final output.
Launch Early to Discover Use Cases
Dr. Li shares unexpected applications for Marble: "A psychologist team called us to use marble to do psychology research. It turned out some of the psychiatric patients they study, they need to understand how their brain responds to different immersive scenes" [00:55:48]. They also found "virtual production for movies" cutting "production time by 40X" [00:53:54]. The lesson: even frontier AI companies don't fully know all use cases until users get their hands on the product.
Build Integration Between Deep Tech and Product From Day One
World Labs structured themselves differently: "We have a team of 30-ish people now, and we are predominantly researchers, or research engineers. But we also have designers and product. We actually really believe that we want to create a company that's anchored in the deep tech of spatial intelligence, but we are actually building serious products" [00:51:41]. This integrated approach allowed them to go from founding to launching a world-first product in just 18 months.
5. Overlooked Insights
The Alignment Problem Between Training Data and Output in Physical AI
Dr. Li reveals a fundamental technical challenge that's rarely discussed: "Even for a researcher like me, I'm very jealous of my colleagues in language because they had this perfect setup where their training data are in words, eventually tokens. And then the producer model that outputs words. So you have this perfect alignment between what you hope to get, which we call objective function and what your training data looks like" [00:43:31].
In contrast, for robots and spatial AI: "You hope to get actions out of robots. But your training data lacks actions in 3D worlds" [00:43:56]. This misalignment between training data (mostly passive videos) and desired output (active 3D manipulation) represents a structural challenge that won't be solved by simply scaling existing approaches. This has massive implications for the timeline and difficulty of achieving advanced robotics, and suggests companies working on synthetic data generation and world models may be more valuable than commonly recognized.
The 20-Watt Brain as a North Star Metric
Dr. Li casually mentions "We operate on about 20 watts" [00:47:49] when discussing human intelligence. This is a profound insight that's buried in the conversation. Current AI models require data centers consuming megawatts of power to achieve narrow capabilities that humans accomplish with less energy than a light bulb. This massive efficiency gap suggests we're still extremely far from truly intelligent systems and that there may be fundamental architectural insights we're missing. For investors and builders, this implies that energy efficiency breakthroughs in AI could be as transformative as the algorithmic advances, yet receive far less attention and funding.