#38 Karol Hausman & Kevin Black: Building A Brain For Any Robot | AI Eating The Physical World
- 01The Intelligence Bottleneck in Robotics
- 02The Taylor Swift Moment: A Paradigm Shift in Robot Learning
- 03The 100-Home Generalization Discovery
1. Key Themes
The Intelligence Bottleneck in Robotics
Physical Intelligence is solving what they identify as the fundamental limitation in robotics. "The biggest bottleneck to the entire field of robotics is intelligence" [00:00:00]. Carl elaborates: "We still believe that the biggest bottleneck to the entire field of robotics is intelligence. It has always been intelligence, and I think it is going to be intelligence for a very, very long time. So we believe that the biggest value unlock we can provide is by solving that problem" [00:17:56].
The Taylor Swift Moment: A Paradigm Shift in Robot Learning
A seemingly simple experiment represented a massive breakthrough in 2022/2023. Carl describes: "We had a robot and in front of the robot there was a Coke can. And then there were three pictures of different celebrities. And the prompt was to put the Coke in on the picture of Taylor Swift. And the caveat was that the robot has never seen any data of Taylor Swift...It turned out that it was able to do it" [00:06:19]. This demonstrated that robots could transfer knowledge from internet pre-training rather than experiencing everything firsthand - a fundamental shift from traditional robotics approaches.
The 100-Home Generalization Discovery
PI made a surprising discovery about the data requirements for generalization. Carl explains: "The numbers that were floating were something like maybe you need to see a million homes before you are able to generalize to the million first home or, you know, a billion homes or maybe, you know, a thousand homes or whatever. And it turned out that this number is very tractable. It's just a hundred homes" [00:31:59]. This finding makes the problem dramatically more solvable than anticipated and suggests either models are extremely powerful or the world is less diverse than assumed.
2. Contrarian Perspectives
Intelligence Enables "Crappy" Hardware
PI challenges the traditional robotics focus on precision hardware. Carl states: "The robots that you buy, the traditional automation robots that are using industrial automation, these are really expensive machines...We see that if you have better levels of intelligence, it actually allows your hardware to be a little bit more crappy because intelligence can accommodate for that" [00:20:21]. He adds: "Some of the demos we are showing, where we are showing robots doing some of the most complex tasks we've ever seen on robots, like floating laundry or lighting a candle, busing tables...we do this on very crappy hardware" [00:21:46]. This inverts the traditional robotics paradigm where precision hardware was considered essential.
Edge Cases Don't Matter as Much as You Think
Kevin offers a controversial take: "If you look at LLMs, they still have very famous edge cases, like count the number of R's and draw very X if you put five R's. And they can't do that still...They're obviously still so useful. And they can solve, I'm a gold problem now, I guess. So it's like, I think it's similar in robotics...if you have a laundry folding robot in your home, like if it drops in underwear like one in every 100 times, one in every 1000 times, like it's okay, it's still really useful actually" [00:52:27]. This challenges the traditional robotics obsession with near-perfect reliability.
Open Sourcing to De-Risk Scientific Uncertainty
PI open sourced their first model PI0, which is highly unusual for a well-funded startup. Carl explains the reasoning: "The biggest risk here is the scientific risk, is that the problem is just too hard, and it's going to take 30 years instead of like three years...So if that's the biggest risk, we want to do everything we can to burn it down as much as we can...if it is solved, then we get to capture some of the value. Other players get to capture some of the value, but if it's not solved, and none of us captures any value, there is no value being created" [00:14:30]. This reflects a sophisticated understanding of existential versus competitive risk.
The Generalist Beats the Specialist at Their Own Game
PI challenges fundamental assumptions about specialization. Carl explains: "We had teams that were focusing on language translation for a very long time, and at some point they hit a certain ceiling...And it turned out that the way to solve language translation is to, it's not by focusing on language translation, but by focusing on all of language...If you can, the more data they can absorb, the more patterns they can spot based on the diversity of data they see, the better they perform, even on those specialist tasks" [00:47:40]. This inverts traditional entrepreneurship wisdom about focus and specialization.
Scaling Laws in Robotics Are Fundamentally Different
Carl challenges a common assumption: "That we are about to have scaling laws that look very, very similar to LLM and VLM scaling laws. I think scaling laws and robotics are very hard and we don't even know what the Y axis of that scaling law should be. And I think people tend to just rush towards putting something out there that they have some kind of scaling law of robotics based on what they've seen in LLM and VLM's before. But I think this is actually going to be much more complicated" [01:03:15].
3. Companies Identified
Physical Intelligence (PI)
A robotics foundation model company building VLA (Vision Language Action) models. "Physical intelligence also known as PI in the robotics industry are building robot foundational models and they're goals to build a model that can control any robot to do any task" [00:00:12]. They've "raised over $400 million from the likes of OpenAI, Lux Capital, Thrive Capital, and more" [00:00:40].
Google (RT2 Project)
Pioneered the Vision Language Action model approach. Kevin notes: "The vision language action model, which actually started at Google under a Carl Supervision was basically taking these VLMs and then adding one more adapter, like some sort of component or some way to have them produce instead of text or images have them produce robot actions" [00:03:18].
Figure AI
Mentioned as having their own VLA approach. The host references "figures has their helix, which is their own version of a VLA" [00:11:11], though no detailed discussion follows.
1X Robotics
Referenced in context of training efficiency. The host mentions: "I was having a conversation with Baron from 1X. And he was kind of saying...after you run the same task, and he said like 45 or 50 times, that incremental accuracy for the robot to perform that test better just completely diminished" [00:33:23].
4. Operating Insights
Co-locate All Teams for Fast Iteration
PI keeps operations, hardware, research, and software teams physically together. Carl explains: "For data collection, we found that it's really important to have all the teams be as close together as possible. So we have the operations team right here, hardware team right here, research team, software team. So you want this interaction to be very, very tight, because we don't fully understand what data to collect to improve the model performance the most" [00:45:58]. This enables rapid learning about data quality and diversity requirements that can't be easily outsourced.
Deploy Early to Learn, Not Just to Commercialize
PI takes an unusual approach to deployment. Carl states: "We believe that the best way to collect that data or to learn really on how to deploy these models is to start to deploy them in the real world, start to have them do really valuable, economically viable jobs, and then learn from that" [01:01:41]. This treats deployment as a research tool rather than purely a commercialization step.
Optimize for Reducing Scientific Risk Over Competitive Risk
When deciding what to open source versus keep proprietary, PI focuses on the biggest actual risk. Carl explains: "The biggest risk here is the scientific risk, is that the problem is just too hard...we are trying to increase the probability of the problem being solved as much as we can. Because if it is solved, then we get to capture some of the value" [00:14:46]. This framework helps prioritize decisions around IP strategy.
5. Overlooked Insights
The Asynchronous Execution Problem
Kevin identified a subtle but important issue with VLA inference. He explains: "Before that, we would actually just have the model stop and think every half second or so...But what I thought was like, even in tasks where what we call static, quasi-static or static tasks where like nothing in the environment is moving super fast, it does, it can still improve performance a little bit to do this asynchronous or parallel execution because when the robot stops, it kind of like wiggles a little and it changes the physics when it stops" [00:38:28]. The physical act of stopping creates unintended perturbations that affect performance even in seemingly static tasks - a non-obvious interaction between software latency and physical dynamics.
Pattern Recognition Machines, Not Just Task-Specific Models
Carl offers a fundamental reframing: "What we are realizing with these large models is that these are less of language models for vision language models for action models. These are just big pattern recognition machines. And you can give them a lot of data of all kinds of different forms and they're able to recognize patterns in that data to then produce predictable outputs" [00:04:32]. This suggests the revolution isn't about better "robot models" but about applying general pattern recognition to physical actions - a subtle but important distinction that opens up different solution paths.