Jagdeep Singh (Rhoda AI) — 10 Hours, Not 10,000: $450M to Train Robots on YouTube
- 01Physical AI Is Still at Zero for Real-World Manipulation
- 02Internet Video as the Only Viable Training Data Source for Robotic Generalization
- 03The VLA Paradigm Is a Well-Funded Dead End
- 04The Data Pyramid: Less Curation Produces Better Models
- 0510–20 Hours of Robot Data vs. 270,000+ Hours from Competitors
- 06Real-World Deployment Unlocks a Proprietary Data Flywheel
1. Key Themes
Physical AI Is Still at Zero for Real-World Manipulation
Despite years of hype, the number of deployed intelligent robots capable of real-world manipulation is effectively zero. Jagdeep distinguishes between locomotion (solved) and manipulation (unsolved), framing it as the defining gap in AI's impact on the physical world.
"The number of deployed intelligent robots that are capable of manipulation in the real world is close to zero. We have robots that are doing things like locomotion and things like inspection and monitoring applications... but as far as applications that involve working with hands and arms manipulation, AI has not made an impact." 00:03:34
Internet Video as the Only Viable Training Data Source for Robotic Generalization
Rhoda's core thesis is that internet video — estimated at 80% of all internet data — is the only data source with the scale, diversity, and physical fidelity needed to train a truly general robotic model. All other approaches (teleoperation, simulation, synthetic data) are fundamentally limited by intentional curation.
"What source of data is there that has the scale, the diversity, and the ability to learn physics, that's already out there that we don't have to generate on our own? There's only one possible answer — that's internet video." 00:09:59
The VLA Paradigm Is a Well-Funded Dead End
The dominant approach in physical AI — Vision Language Action models trained on teleoperation data — is, in Rhoda's view, structurally incapable of producing generalization regardless of how much capital is spent. Competitors raising $1–2B+ are largely spending it on teleoperation data collection.
"A lifetime spent teleoperating robots is not going to be enough. It's going to be a drop in the bucket compared to what you need to really learn how to generalize... you're spending a lot of money on what we believe is a dead end path." 00:00:00
The Data Pyramid: Less Curation Produces Better Models
Rhoda's training architecture — the "data pyramid" — is counterintuitive: the broadest, least-curated base produces a stronger physics prior, which enables extreme data efficiency at the task-specific fine-tuning layer. Models trained on less curated data actually outperform those trained on more curated data.
"The models that tend to perform best, in our experience, are the ones that have less curated data. Because the less curated data, it turns out, forces the model to learn more about how the world works." 00:00:33
10–20 Hours of Robot Data vs. 270,000+ Hours from Competitors
The practical output of Rhoda's pre-training approach is a dramatic reduction in task-specific data requirements — from industry claims of 70,000 to 500,000 hours of robot data, down to 10–20 hours. This was validated in live factory settings with real material flow, not staged lab environments.
"We can train models to perform very sophisticated tasks... with on the order of 10 to 20 hours of robot data. Without that, you would have to collect tens of thousands of hours of data." 00:11:30
Real-World Deployment Unlocks a Proprietary Data Flywheel
Once the model is autonomous enough to deploy in real factories, it encounters the long-tail corner cases that no teleoperation or simulation dataset could intentionally capture. This data — collected via a technique called DAGGER — is proprietary and creates a compounding moat.
"Once you're running autonomously in real factories, now you're starting to see all the corner cases... and that data is proprietary. That's not available on the internet. Only a robot that's already autonomous and operating will be able to collect that kind of data." 00:18:45
Form Factor Should Follow Function, Not Human Aesthetics
Rhoda is building wheel-based robots, not humanoids with legs. The humanoid form is adopted only where it is functionally necessary (arms, vision, compute) and discarded where it is not (legs). Safety in manufacturing environments — particularly e-stop requirements — actually makes legs dangerous.
"A robot on legs is going to collapse and either damage itself or worse injure a human being. So we're not chasing the human form for its own sake." 00:36:46
The Integration Software Layer Is the Underrated Bottleneck to Scale
The AI model being solved is necessary but not sufficient. The "bridge software" that makes the full system push-button deployable in real factories is the unexpected friction point that slows commercialization — and is being systematically underestimated by the industry.
"The part that I wouldn't have listed high on the list is the software that pulls it all together... All this has to be done in a push button way where you've got one button and you say go and the model runs." 00:30:18
2. Contrarian Perspectives
Scaling Teleoperation Hours Is Not Progress — It's a Vanity Metric
While the industry races to announce ever-larger teleoperation datasets (70K hours → 270K hours → 500K hours), Rhoda argues this progression is meaningless. The framing of "more hours = better model" is precisely wrong because diversity, not scale of curated data, determines generalization.
"If you look at the evolution of VLAs over the past few years, you'll see some people claim that we trained on 70,000 hours of robot data. Then you'll see the next guy saying we train on 270,000 hours... and then you come to Rhoda, which is later in time, and we say we train on 10 to 20 hours of data." 00:20:55
The More Specialized a Robot, the Harder and Slower It Deploys
The conventional wisdom is that task-specific robots are easier to deploy because they are purpose-built. Jagdeep inverts this: specialization requires more surrounding infrastructure (conveyors, mechatronics), which is the actual deployment bottleneck. Generalist robots are paradoxically easier to roll out.
"The more specialized your robot is, the more infrastructure is required to run it. And the more generalist it is, the easier it is to roll it out." 00:35:34
Physical AI Model Training Does Not Require Frontier-Scale Compute
The prevailing narrative equates physical AI investment with LLM-style compute arms races. Jagdeep pushes back: physical AI models are fundamentally smaller than language models, and the capital being deployed by competitors is not going to compute — it's going to teleoperation data collection, which he believes is wasted.
"The physical AI models are not nearly as big as the models that we see in the language model space. You don't need the same level of compute to train those models." 00:08:38
Simulation Has a Fatal Flaw That Goes Beyond Sim-to-Real Gap
The standard critique of simulation-based training is the sim-to-real gap. Jagdeep adds a deeper, less-discussed structural flaw: simulation, like teleoperation, is inherently curated data. Even setting aside physics fidelity issues, the diversity problem alone is fatal to generalization.
"You have the exact same problem with diversity. You're collecting data that you intentionally collect. And diversity, it turns out, is an even bigger issue with generalization than pure scale alone." 00:13:58
Winner-Take-All Dynamics Will Emerge Much Earlier Than Expected in Physical AI
Jagdeep argues the physical AI market will consolidate around a single dominant model — not because of brand or distribution, but because the model that gets robust enough to deploy widely will collect disproportionate long-tail data, making it the only model anyone wants. This flywheel collapses the competitive window.
"In our view, it really is a winner-take-all kind of a scenario. Because the model that gets good enough to see more of these corner cases is the only model people want to deploy. But the model that people want to deploy is the one that gets more data to make it even more robust." 00:19:25
3. Companies Identified
Rhoda AI AI-first robotics company training general-purpose manipulation models on internet video rather than teleoperation data. Raised $450M. The central subject of the episode.
"We can train models to perform very sophisticated tasks... with on the order of 10 to 20 hours of robot data. That's really a high watermark, maybe a low watermark for how much data is required." 00:11:30
Physical Intelligence (PI) Physical AI robotics company pursuing the VLA paradigm. Mentioned as a well-capitalized competitor.
"Physical Intelligence has raised 1.1 billion." 00:08:00
Skilled Physical AI robotics company. Named as a competitor in the VLA space.
"Skilled has raised 1.5 [billion]." 00:08:00
Figure Humanoid robotics company. Named as the largest funded competitor in the VLA space.
"Figure 2.3, maybe more now." 00:08:00
Infinera Optical networking company founded by Jagdeep Singh. Mentioned as his first successful deep tech company built against prevailing market assumptions.
"Optical networking with Infinera." 00:02:02
QuantumScape Solid-state battery company co-founded by Jagdeep Singh. Mentioned as his second major deep tech bet.
"Solid state batteries with QuantumScape." 00:02:02
Boston Dynamics Referenced as the iconic example of robotics demos that are compelling in lab settings but not representative of real-world deployment capability.
"You've probably seen lots of robots doing cool things on videos. Boston Dynamics and all of these viral videos. You can do backflips. You can do things like even fold T-shirts and make coffee. It turns out that all of those demonstrations are already in a lab setting." 00:04:20
NVIDIA Referenced as the canonical example of how value in a technology stack can bypass the infrastructure layer and accrue at the inference and application layers.
"NVIDIA is a good example of this — they made all of this investment in the picks and shovels and then the value accrued to the inference layer and the application layer. Is the same thing gonna happen in physical AI?" 00:16:56
Amazon Mentioned as a natural scaled customer for Rhoda's decanting task, given the universality of inbound material processing.
"If we do this task here, we can train for its variants at places like Amazon and anywhere else that materials are processed." 00:28:29
VSC Ventures Early-stage venture fund, host of the podcast, investor in Rhoda's $450M round.
"You just announced this major capital raise, brought on some great investors, ourselves as a small part of it." 00:38:08
4. People Identified
Jagdeep Singh Serial deep tech founder; CEO and co-founder of Rhoda AI; previously founded Infinera (optical networking) and co-founded QuantumScape (solid-state batteries). Identified as one of the very few founders to have successfully commercialized deep tech across three separate technology cycles.
"You're one of the few founders in physical AI that has scaled through many different cycles and understood how to take the concept phase all the way through commercialization." 00:00:38
Jay Kapoor General Partner at VSC Ventures, host of CLIMB podcast, investor in Rhoda.
"My name is Jay Kapoor, General Partner of VSC Ventures, and I'll be your host today." 00:00:38
5. Operating Insights
Treat Your Startup as a Ranked List of Hypotheses, Not a Vision
Jagdeep's most tactical operating framework: never allow conviction in your thesis to outpace your evidence base. Explicitly list every assumption underlying the business in order of risk and work through them sequentially. Each de-risked hypothesis directly translates to valuation.
"Think of your startup as simply a collection of hypotheses. Do not start drinking your own Kool-Aid. Realize that the whole idea you have is just a bunch of hypotheses. And think of your job as being a key risk officer where you literally just list out the key hypotheses in order of risk, and you attack them one at a time and try to de-risk the opportunity." 00:32:53
The Four Non-Negotiable Pillars Before Starting a Deep Tech Company
Jagdeep's checklist that he applies before founding: (1) a truly large unsolved problem, (2) differentiated technology — not commodity, (3) an exceptional team competitive with anyone globally, and (4) early customers who are not just interested but actively want to help you get to market. All four must be present.
"I look for customers that say, not just this is interesting, but this is so interesting that if you guys could build this, I want to help you get it to market. How do I help you?" 00:32:16
De-Risk in the Field, Not the Lab — Then Convert POCs to Revenue
The transition from POC to paid deployment is where most deep tech companies stall. Rhoda's operating priority for the year ahead is explicitly this conversion — and the discipline of running POCs in live factory conditions (not staged environments) is what makes the POC evidence credible enough to justify that conversion.
"We have multiple customer POCs that have already been successful in the field and the next logical step that anybody can assume is then to convert those POCs into real deployments and real revenue." 00:38:19
Being Contrarian Is Only Valuable When Paired With a De-Risking Strategy
Contrarianism alone is not a strategy — it's a starting condition. The operating discipline that makes contrarian bets valuable is the systematic reduction of risk through experiment, not conviction. This is what separates contrarian-and-right from contrarian-and-wrong.
"You want to be contrarian because if you're conventional, then the value is already priced in. And even if you're right, there's not a lot of value creation to be had. But if you're contrarian and wrong, you're still wrong. So how do you ensure you're contrarian and right?" 00:32:53
6. Overlooked Insights
The DAGGER Technique Is the Real Moat — Not the Model Architecture
The episode spends most of its time on the pre-training story (internet video, 10–20 hours), but Jagdeep briefly names a specific technique — DAGGER — as the mechanism for collecting long-tail, proprietary real-world data once the robot is autonomous. This is actually the harder-to-replicate compounding asset. Anyone with enough talent could eventually replicate the video pre-training methodology. But DAGGER-collected corner-case data from live deployments is permanently proprietary and grows only with scale. This is the actual source of durable competitive advantage, and it was mentioned only once without elaboration.
"As you deal with those edge scenarios using a technique called DAGGER, you're now getting data on the long tail distribution. And that data is proprietary. That's not available on the internet. Only a robot that's already autonomous and operating will be able to collect that kind of data." 00:18:45
The Real-Time + Accuracy Constraint Is an Unannounced Technical Moat That Most Observers Will Miss
Jagdeep briefly describes a specific technical bottleneck — making video-based physics predictions that are simultaneously physically accurate (no hallucinations) and fast enough to run in a few hundred milliseconds — and states that Rhoda is not aware of any other video model that can do both. This is an underappreciated technical moat because making models larger improves accuracy but degrades speed; solving both simultaneously required a non-obvious architectural breakthrough. This capability is prerequisite to everything else Rhoda claims, yet was mentioned almost in passing.
"We need to make video predictions of what's going to happen that are both physically accurate, they don't involve hallucinations, and they have to happen in real time, in a few hundred milliseconds. We're not aware of another video model that can make video predictions that are that accurate and real time." 00:17:17