Pete Florence (Generalist) — Scaling...

1. Key Themes

Scaling Laws Have Finally Arrived in Robotics

The central thesis of the episode is that robotics has crossed a threshold where the same scaling dynamics that transformed language models now apply to physical interaction models. Pete Florence describes a landmark result with Gen Zero where increasing data continuously improves performance across all tracked tasks simultaneously — a first for embodied AI.

"We have a very general purpose recipe. And to be honest, it doesn't even really necessarily matter too much what the x-axis is other than like the x-axis is something that we can continually do more of. And the y-axis is some measure of how good the robots are. And we see that we can just continue to pour in more and more effort on that x-axis and the y-axis continues to get better and better." — Pete Florence [00:09:22]

The Generalist Always Eventually Invades the Specialist's Territory

Florence makes a strategic argument that narrow/specialist robotics is a fundamentally losing long-term bet, drawing on the lesson from ML history that general models always eventually encroach on niche domains.

"Every time you try and think, oh, I'm just going to have this narrow little model in this one little domain. And that's going to be my little niche. And then the general models will do other things but not my thing. That's not a bad long term that we think is the right one to take." — Pete Florence [00:03:33]

Cross-Task Transfer: Training on Everything Makes Each Individual Task Better

A non-obvious and profound insight is that training on all tasks simultaneously improves performance on every individual task — the opposite of the intuition that you need task-specific tuning to master specific behaviors.

"All of the data makes everything better. And not just the data but the way the models are trained. Once you sort of take the leap of faith that all of the tasks that you can possibly think of, trained all together, do indeed make the model better at all the individual little things, yeah, you want a general purpose system." — Pete Florence [00:04:25]

Double Descent: The Counterintuitive Physics of Model Scaling

Florence explains the concept of double descent — where making a model bigger, counterintuitively, reduces overfitting when you have enough data. This is the foundational scientific insight that unlocks Gen Zero's scaling behavior.

"It wasn't forever ago when the field started to understand this concept of double descent. And actually, it depends on the regime in which you're in and it depends on how much data you have. And there are actually regimes where if you have enough data, well, then making your model actually much bigger, it is actually much more effective at avoiding these sort of overfitting effects." — Pete Florence [00:11:54]

Real-World Data as the Critical and Scarce Input

While synthetic data is widely discussed as a solution to robotics' data scarcity, Florence signals a strong conviction that real-world data remains essential and is Generalist's primary focus — a meaningful strategic differentiation.

"All of the data that is in Gen Zero that we've talked about, including the sheer amount of it, that is all real world data. We have many different threads in synthetic, but we do really believe that real world data is essential." — Pete Florence [00:14:26]

Multimodal Language Models as the Robot Brain — Not Just a Component

Florence traces the intellectual evolution from LLMs as auxiliary tools for robots to LLMs becoming the actual brain of the robot. He was personally involved in pioneering this at Google, noting it required building a custom multimodal model (Pomi) to make it work.

"The most powerful way is to just take the language model and make sure it is a multimodal language model, which back at Google, like we actually had to at the time, like there was basically only one multimodal language model that existed before the one that we made back at Google, which is called Pomi. And so we had to make our own. And then we just made the whole thing directly, like the brain of the robot rather than some type of engineered system." — Pete Florence [00:07:45]

Closed-Loop Sensing as the Definitional Core of Next-Gen Robotics

Florence draws a sharp line between traditional factory robots (which are technically closed-loop at a low level) and the new generation of robots capable of multimodal sensing and generalized skill transfer — responding to the world rather than executing pre-programmed sequences.

"Figuring out how to make decisions given observations of the world, you know, including different types of stimuli, I think that is the core of robotics in general... things like just common sense in the world, like physically, like being able to recover from all the different edge cases that people run into, or being very robust to, like, no matter what happens, if somebody changes what the environment looks like or the packaging changes or some other notion of the task changes over time, like, those are the types of things that we take for granted as being very easy but are the types of things that we need to solve for the next generation of robotics." — Pete Florence [00:19:02]

Robotics Will Diffuse Faster Than Autonomous Vehicles Because the Risk Profile Is Lower

Florence makes a subtle but important point: unlike self-driving cars, where a single hard threshold (safe public road navigation) had to be cleared before deployment, robots have many more incremental use cases that can be shipped safely before full capability is achieved.

"For the next generation of robotics, some of the, you know, it is going to be a long journey in some ways for certain levels of just like full capabilities. Yet at the same time, there's a lot of robots that can be shipped to do things that are not as dangerous to humans as driving on the public roads, right? So I think that there's just a lot of different types of robots, a lot of different types of use cases that people will want to use them for. It's not as much of a singular problem as self-driving has been." — Pete Florence [00:22:41]

2. Contrarian Perspectives

Knowing When the Robot Doesn't Know Is as Valuable as Knowing

Florence argues that recognizing the limits of one's knowledge — saying "I don't know" — is not a weakness but a critical safety and reliability property, and one currently missing from most models. This runs counter to the industry's optimization focus on capability maximization.

"There's kind of two ways to avoid a hallucination for a language model, right? You can either say the correct thing or you can say, I don't know. So that's the other way to avoid hallucinations is sort of recognize the limits. And I think that is a very useful type of concept to have for, you know, physically acting robotics as well." — Pete Florence [00:20:50]

Making the Model Bigger Fixes Overfitting — The Opposite of What You Were Taught

Classical ML doctrine says overfit models should be made smaller. Florence argues the opposite is true in the data-rich regime, and that reaching this regime in robotics — and correctly configuring training to observe it — is what took Generalist a year of iteration to achieve.

"If your model looked like the validation loss was going up, you would call that overfitting and you would say, okay, well, I need to somehow reduce my overfitting. And one way that you might try is to actually make the model smaller. And there are actually regimes where if you have enough data, well, then making your model actually much bigger, it is actually much more effective at avoiding these sort of overfitting effects." — Pete Florence [00:11:54]

Focus Is a Research Advantage, Not Just an Operational Constraint

Against the instinct that more data sources are always better, Florence argues that research culture and organizational focus are themselves performance variables. Spreading attention across too many data paradigms undermines the ability to push the frontier.

"If human organizations operated such that focus could be infinitely sharded... then, yeah, you would want every single data source you can. But I think the reality is that building a culture of a team where you are really pushing the frontier, it's helpful to have a certain amount of focus on particular bets that you are making in terms of research." — Pete Florence [00:15:16]

Specialist Robotics Is a Long-Term Losing Bet Despite Current Dominance

The most commercially successful robots today (Roombas, Amazon warehouse robots) are specialists. Florence's contrarian view is that their current success is irrelevant to long-term competitive dynamics — the general model will eventually absorb every specialty domain.

"Every time you try and think, oh, I'm just going to have this narrow little model in this one little domain... that is not a bad long term that we think is the right one to take." — Pete Florence [00:03:33]

3. Companies Identified

Generalist (Gen Zero)

Robotics foundation model company building embodied AI that scales with physical interaction data. Highlighted as having achieved the first observable scaling law in robotics — where more real-world training data continuously improves performance across all tasks simultaneously. Florence is co-founder and CEO.

"We are able to take a model that is trained on very general purpose, physical interaction data. And we train on more and more of it and every single task that we are tracking continues to get better. And that is like a landmark type of moment, at least in terms of like relative to what we've seen before." — Pete Florence [00:09:22]

Google (DeepMind / Robotics)

Pete Florence's prior employer, where he worked on early vision-language-action (VLA) models and built the Pomi multimodal language model to enable a language model to serve as the literal brain of a robot.

"Back at Google, like we actually had to at the time, like there was basically only one multimodal language model that existed before the one that we made back at Google, which is called Pomi. And so we had to make our own." — Pete Florence [00:07:45]

Waymo

Autonomous vehicle company cited as the most visible proof point of what embodied AI can achieve at scale, and used by Florence as a benchmark for thinking about the pace of robotics deployment.

"There's a lot of robots that can be shipped to do things that are not as dangerous to humans as driving on the public roads, right? So I think that there's just a lot of different types of robots... It's not as much of a singular problem as self-driving has been." — Pete Florence [00:22:41]

4. People Identified

Pete Florence

Co-founder and CEO of Generalist. Former researcher at Google where he worked on foundational vision-language-action models and helped build Pomi, one of the first multimodal language models used as a robot brain. Has been in AI/robotics research long enough to have witnessed the rise of language models from the mid-2010s onward. Now leading Generalist's Gen Zero, which he claims has achieved genuine scaling laws for embodied robotics.

"I haven't been around forever, but I've been around long enough that I remember well when people were starting to whisper that, hey, you know, these language models are starting to really work." — Pete Florence [00:05:11]

Andra Kay

Mentioned in the intro as a voice characterizing the current moment in humanoid robotics. Described as saying "Humanoids are having a Cambrian moment."

"As Andra Kay says, Humanoids are having a Cambrian moment, and the progress from just 12 months ago is pretty amazing." — Jeff Frick [00:02:42]

5. Operating Insights

Achieving a Research Breakthrough Took a Year of Iteration — Not a First Try

Florence is direct that Gen Zero's scaling result was not a clean, first-attempt success. This is a useful reminder for operators building R&D teams: landmark results require sustained iteration and should not be judged on early attempts failing to show the desired effect.

"I would say it wasn't like the first time we tried all this, we sort of beautifully got this type of result. It took a lot of iteration over a year from the team." — Pete Florence [00:12:43]

Pre-Training Must Be Completely Separated From Task Design to Achieve True Generalization

Florence identifies a subtle but critical design flaw in prior foundation model attempts for robotics: task-specific assumptions "seep in" to the training process even when developers claim generality. The operating lesson is to structurally enforce separation between capability-building and task-solving at the architecture and process level.

"The way in which we scale pre-training is completely separated from any idea of how we think about any sort of particular task that we are solving. We continue to scale general purpose data. Every single task that we track continues to get better." — Pete Florence [00:10:13]

6. Overlooked Insights

The Parameter Threshold for Emergent Scaling Is Shockingly Specific — and Quantified

Florence's keynote described specific parameter counts (1B, 6B, 7B) at which qualitatively different scaling behavior emerged. This is not a vague "bigger is better" claim — it is a precise empirical finding that the transition from diminishing returns to continuous improvement happened at a specific threshold. For investors, this means there may be a minimum viable model size below which robotics foundation model companies simply cannot exhibit scaling dynamics, creating a structural barrier to entry that filters out undercapitalized competitors. Jeff Frick's intro captures this with precision:

"At a certain point, the model really stops reacting positively to more data and more training... And then at seven billion parameters, something kind of magical happened. It just kept getting better. So the more data, the more trials you pumped into the machine, the better and better it got across skills and behaviors." — Jeff Frick [00:00:49]

"Intelligence Too Cheap to Meter" Is Coming to the Physical World

Florence briefly invokes — and then extends — the concept of intelligence too cheap to meter into the physical domain. This is a throwaway line in the conversation, but it implies that the economic model for physical labor could be fundamentally disrupted in the same way compute costs were. The framing of a robot as a productivity amplifier rather than a replacement is the more palatable near-term version, but the endpoint Florence is gesturing at is labor as an asymptotically zero-cost input.

"Over in more the LLM model providers, there's this concept that we eventually might be able to reach, you know, intelligence too cheap to meter... I think a similar thing happening in the physical world... the concept of just having a robot that could help you with almost any task that you can imagine and have that be a very productive partner to amplify your productivity. We think that is very much a world in which we're headed." — Pete Florence [00:16:11]