Teahose.
SIGN IN
NEW HERE — WHAT TEAHOSE DOES
We read the entire AI & tech firehose — so you don't have to.
PODPodcastsAll-In, No Priors, Acquired…
NEWNewslettersStratechery, Newcomer…
PAPPapersPhysical AI research
PHProduct Huntdaily launches
VCInvestor ScoutSequoia, a16z, Benchmark…
CLAUDE DISTILLS →
7 reads, 30 sec each — free, 6 AM ET.
+ a live graph of the companies, people & themes underneath.
HOME/晚点聊 LATETALK/170: 【具身季报 26Q2】世界模型大风不停,和不想被贴标签…
POD
// EPISODE
晚点聊 LATETALK

170: 【具身季报 26Q2】世界模型大风不停,和不想被贴标签的人

DATE June 27, 2026SOURCE 晚点聊 LATETALKPARTICIPANTS MANCHI, 晚点团队
// KEY TAKEAWAYS6 ITEMS
  1. 01The Humanoid Robot Marathon as a Technology Proving Ground
  2. 02Figure AI's 100-Hour Live Stream as a 0-to-1 Proof of Industrial Value
  3. 03World Models Transition from Lab Concept to Industrial-Grade Product
  4. 04Dexterous Hands Enter a New Competitive Phase, With Direct-Drive Leading Research
  5. 05Data Collection Paradigms Are the Leading Indicator of Model Breakthroughs
  6. 06Embodied AI Model Architectures Are Converging: VLA + World Model Fusion

Episode: LateTalk #170 — Embodied AI Q2 2026 Quarterly Report Guest: Chen Zhe (Peter), Founding Partner of AlphaEast Host: Manchi


1. Key Themes

The Humanoid Robot Marathon as a Technology Proving Ground

The Beijing Humanoid Marathon marked a step-change in public perception and technical benchmarking. Honor's robot division swept gold, silver, and bronze in the autonomous navigation category, finishing in approximately 50 minutes — a more than 3x improvement over the prior year's winning time of 2 hours 40 minutes, which was also remote-controlled rather than autonomous.

"I think this marathon competition proved that a team with high-end manufacturing experience and strong organizational capability can quickly produce very competitive humanoid robot products with sufficient resources and talent density." [00:10:09.800]

"The broader implication is that in China there are quite a few large companies with capabilities like Honor's — in manufacturing, talent, and capital. Companies like Xiaomi, Xpeng, Li Auto and other EV and smartphone manufacturers have already begun seriously investing in the humanoid body game." [00:10:39.380]

Figure AI's 100-Hour Live Stream as a 0-to-1 Proof of Industrial Value

Figure AI ran three humanoid robots continuously for over 100 hours sorting 130,000 packages at roughly one every three seconds — the first public, large-scale demonstration of humanoid robots performing real industrial work. Peter frames this as a watershed moment of public education, analogous to China's Spring Festival Gala robot appearances domestically.

"Figure is the first company in the world to demonstrate such a scene to a global public audience through a live broadcast. I think the demonstration value is very obvious." [00:15:56.200]

"The logistics parcel sorting scenario is very suitable — it requires a certain degree of generalization and general manipulation ability, and it is a continuous, long-duration task that is very unsuitable for a human to work at for extended periods." [00:15:56.200]

World Models Transition from Lab Concept to Industrial-Grade Product

NVIDIA's Cosmos 3 — released June 1 — is identified as the quarter's defining technical benchmark. Its Mixture of Transformers (MoT) architecture unifies autoregressive reasoning and diffusion-based generation in a single omni-model capable of ingesting and outputting text, image, video, audio, and action modalities natively.

"I think Cosmos 3 is the benchmark world model of this quarter, because it is arguably the first fully open-source Omni Model in the market... I believe it genuinely pulled the world model concept from the laboratory to an industrial-grade, deployable state." [00:05:51.720]

"Researchers I spoke with at top North American labs doing LLM work didn't really understand why we invented VLA and World Model as opposing concepts. In their view, these are the same thing — and the best model should simply be an Omni Model, which is exactly the direction Cosmos 3 represents." [01:12:13.880]

Dexterous Hands Enter a New Competitive Phase, With Direct-Drive Leading Research

At ICRA (Vienna), Fino (5G) released its second-generation direct-drive dexterous hand, which Peter identifies as the clear standout — not only matching Sharpard's ~20 degrees of freedom but at roughly half the volume, solving serious thermal and back-drivability issues from gen one.

"Fino and Sharpard are probably the two most advanced companies globally doing direct-drive high-degree-of-freedom dexterous hands... If you put Fino's hand and Sharpard's hand side by side, you'll find that Fino's hand is roughly half the volume of Sharpard's. In terms of simulating the size of a real human hand, Fino's advantage is very obvious." [00:38:12.200]

"Whoever can more quickly provide reliable, stable, and affordable dexterous hand supply will more quickly become the de facto standard of the industry — and more algorithms and software will be designed around your hardware architecture." [01:00:38.120]

Data Collection Paradigms Are the Leading Indicator of Model Breakthroughs

Peter offers a systematic history of data collection evolution — from Aloha teleoperation (2023) → UMI bodyless collection → EgoCentric first-person video → NVIDIA Isaac (Sonic) whole-body motion capture — and argues that today's data investments predict model breakthroughs in 3–6 months.

"Every cycle of model paradigm iteration is essentially a change in the data paradigm. By watching what data collection companies are doing, we can get an early read on where the industry is heading." [00:30:25.440]

"The data we are collecting today will likely translate into model breakthroughs three to six months from now." [00:34:21.300]

Embodied AI Model Architectures Are Converging: VLA + World Model Fusion

Both Pi 0.7 (Physical Intelligence) and Gen1 (Generalist) represent meaningful iterations beyond classical VLA. Pi 0.7 attaches a lightweight world model for sub-goal image prediction; Generalist trained a from-scratch Transformer on 500,000 hours of UMI-style bodyless data without relying on a pre-trained VLM backbone.

"I can summarize Pi 0.7's biggest difference from previous versions in one sentence: it grafts a lightweight world model onto a traditional VLA, and this lightweight world model can provide predictions of future task images, which then influence the generative model to produce corresponding actions." [01:20:29.940]

"Generalist collected 500,000 hours of real-world interaction data of a UMI-type bodyless collection... trained a model completely from scratch, without relying on a pre-trained VLA. This reflects the team's high confidence in their own technical approach and the strong scaling capability of their model." [01:23:23.100]

The Critical Unanswered Question: Will Embodied AI Models Be Won by Specialists or General-Purpose Giants?

Peter raises what he calls the most fundamental open question in the field: whether the embodied AI model layer will ultimately be provided by Anthropic, OpenAI, or Google rather than dedicated robotics startups.

"The more fundamental question is: what reason do we have to believe that, many years from now, the so-called Embodied AI model will not be provided by general-purpose model giants like Anthropic, OpenAI, or Google? If the Omni Model route is ultimately proven effective and useful, can all of a robot's spatial understanding and action prediction be merged into a larger general-purpose model?" [01:32:06.640]

OpenAI Robotics Relaunch Signals a Technical Inflection Point

Sam Altman formally announced OpenAI's robotics team expansion in Q2, led by Aditya Ramesh (creator of DALL-E), whose internal team was previously called "World Simulation Research." Peter interprets this not as a strategic pivot but as a signal that OpenAI sees genuine technical readiness — and notes the compute requirement is modest by their standards.

"The peak compute budget for Embodied AI model training today is roughly at the ten-thousand-GPU-card level. This is already the ceiling for the entire embodied AI research compute today. By that standard, for OpenAI or Anthropic, this is not a particularly large compute budget." [01:35:28.300]

"OpenAI Robotics launching so publicly at this moment is actually a signal that they see technical maturity and a strong connection to their core model capabilities." [01:36:56.460]

China's Structural Disadvantage: Inability to Tolerate Long Commercialization Timelines

Peter draws a sharp contrast between US and Chinese startup cultures. US companies like Physical Intelligence and OpenAI (pre-ChatGPT) ran for years without revenue; Chinese investors and capital markets cannot tolerate this, forcing all Chinese embodied AI companies into both full-stack ambitions and early commercialization posturing simultaneously.

"In China today, if I come out and say I'm building a company and explicitly tell investors that for the next ten years I will absolutely not commercialize and absolutely not generate revenue — the problem is how do you attract sufficient resources, talent, good capital market response, and ultimately how do you address the downstream public markets challenge?" [01:44:48.150]

"I think in China's market, two things are very hard for startups: first, you cannot say you'll have no revenue for ten years. Second, you cannot say you are only a brain company or only a hardware body company." [01:51:42.070]


2. Contrarian Perspectives

The Tendon-Driven (Sinew-Driven) Dexterous Hand Is a Dead End for Independent Companies

While most Chinese large companies are following Tesla's Optimus tendon-driven blueprint, Peter argues that any independent dexterous hand company using this approach will inevitably become a captive customization shop for large OEMs rather than a scalable product company.

"For an independent dexterous hand company, I think doing a full direct-drive approach — one that doesn't depend on the forearm — may be the only or better route. Because if you're a solution that requires deep forearm integration, you will in all likelihood become a large company's customization shop. It will be very hard to independently build a standardized product company." [00:59:39.400]

"Chinese large companies choose to follow Tesla's tendon-driven route partly because the engineers don't want to bear the risk of choosing a different direction. If Tesla turns out to be right and you went a different way, who takes responsibility for that mistake?" [00:59:09.440]

Dexterous Hand Data Cannot Be Commoditized the Way Lidar Data Was

The conventional assumption is that third-party data service companies will emerge for dexterous hand data as they did for LLM training data. Peter argues the analogy breaks down because dexterous hand data is inseparable from the specific hardware architecture — unlike lidar point cloud data, which is largely hardware-agnostic.

"In lidar, many manufacturers' algorithms do not depend on the specific lidar model or configuration — different lidar point cloud data can all be fed through a pipeline and trained into a larger model. So lidar companies have not had high bargaining power over data. But for dexterous hands, I think it's different — the data dimensions and formats are closely tied to the hardware design and solution design." [00:51:52.220]

Humanoid Robot Body Companies That Don't IPO in 2026 Face Existential Risk

Peter argues that for pure hardware body companies, the IPO window is not merely a financing event but a survival threshold. Large consumer electronics and auto manufacturers (Honor, Xiaomi, Xpeng, Li Auto) are entering the market with superior manufacturing scale and reliability — and will crowd out smaller players who haven't yet secured sufficient resources.

"For those companies focusing purely on the hardware body, a key hurdle this year is whether they can IPO and reach a relatively safe and resource-rich state. If they can't achieve this, then by next year — say 2027 — large manufacturers like Honor, with more resources, along with smartphone and auto OEMs, will have their robot bodies entering the market. And these companies have much more experience in large-scale manufacturing, reliability, and consistency." [00:09:43.960]

The "Full-Stack" Imperative in China May Be a Trap, Not a Moat

Most Chinese embodied AI companies are racing to become full-stack (brain + body + dexterous hand) because the market punishes dependency on partners. Peter suggests the highest-value companies globally (Anthropic, OpenAI) are not full-stack, and this forced full-stack posture may dilute focus and destroy returns.

"The eventual big winners will most likely be full-stack companies — like Apple, Xiaomi, Huawei. But what's interesting is that today the highest market-cap companies are actually not full-stack. Anthropic or OpenAI, if they eventually go public, can't really be called full-stack companies." [01:51:13.490]

Behavior Cloning for Dexterous Hands Is Two Years Behind VLA Maturity

The current state of dexterous hand manipulation models resembles the Aloha teleoperation era of 2023 — still in early demonstration mode, not yet approaching generalized policy. Companies showing impressive demos (Genesis, etc.) are doing sophisticated behavior cloning, not robust generalization.

"Today's dexterous hand manipulation paradigm is still a lot like two years ago when we were using Aloha for behavior cloning — cloning human behavior. So if you change the task, or the environment changes slightly, the success rate may drop quite noticeably." [00:45:31.640]


3. Companies Identified

Honor (荣耀) Robot Division Smartphone OEM Honor's robotics division, established approximately two years ago with nearly 100–200 employees. Why mentioned: Won gold, silver, and bronze at the Beijing Humanoid Marathon in the autonomous navigation category, finishing all three robots in approximately 50 minutes — defeating specialist robotics startups Unitree and Beijing Humanoid Robot Innovation Center. Demonstrated that large consumer electronics manufacturers with capital, manufacturing expertise, and engineering talent density can rapidly produce world-competitive humanoid robots.

"Honor's victory in the marathon is a harbinger of what the future competitive landscape of this market may look like." [00:11:07.980]

Figure AI US humanoid robot company. Why mentioned: Ran the industry's first large-scale public industrial deployment demonstration — three humanoid robots live-streaming parcel sorting for 100+ hours, processing 130,000 packages at ~3 seconds each using their Helix 02 model.

"Figure is the first company in the world to demonstrate such a scene through a live broadcast to a global public audience." [00:15:56.200]

Physical Intelligence (π) US robotics AI company, maker of the Pi series VLA models. Why mentioned: Consistent iterative innovation leader; Pi 0.7 introduced a lightweight world model sub-goal prediction layer on top of a traditional VLA, representing a meaningful architectural advance.

"Pi has always been leading the frontier of VLA research... Pi 0.7's biggest difference: it grafts a lightweight world model onto a traditional VLA." [01:20:29.940]

Generalist (创造者/GEN1 team) US robotics AI company led by team including Peter Florence, focused on training embodied models from scratch on native physical interaction data. Why mentioned: Gen1 achieved 99% success rates on complex long-horizon tasks (up from ~60%), collected 500,000 hours of UMI-style bodyless data, and trained without a pre-trained VLM backbone, demonstrating strong scaling law behavior.

"Generalist has always claimed they have found the key to the scaling law." [01:23:50.700]

NVIDIA (英伟达) Why mentioned: Released Cosmos 3 on June 1 — the quarter's defining world model benchmark. First fully open-source Omni Model with Mixture of Transformers architecture. Also credited for Isaac (Sonic) whole-body motion capture open-source work enabling humanoid full-body data collection.

"Cosmos 3 pulled the world model concept from the laboratory to an industrial-grade, deployable state." [00:05:51.720]

Fino (5G / 灵巧手 direct-drive company) Chinese direct-drive dexterous hand company. Why mentioned: Second-generation hand released at ICRA was the clear standout of the conference — 20 active degrees of freedom, approximately half the volume of Sharpard's hand, solved thermal and back-drivability problems of gen one, attracting extensive hands-on attention from global researchers.

"Fino and Sharpard are probably the two most advanced companies globally doing direct-drive high-degree-of-freedom dexterous hands... Fino's hand is roughly half the volume of Sharpard's." [00:38:12.200]

Sharpard Global pioneer in high-DOF direct-drive dexterous hands (~22 DOF, ~$50,000). Why mentioned: Defined the ICRA 2025 benchmark for dexterous hands; Gen 2 from Fino is now seen as catching and in some ways surpassing it. Used by leading global embodied AI research labs.

"Last year at ICRA, the biggest impression Sharpard made was releasing its high-degree-of-freedom dexterous hand. This year, the deepest impression was Fino's second-generation hand." [00:36:17.120]

Genesis Chinese company (founded 2024), started in robot simulation environments, pivoted to dexterous manipulation and full-body system development. Why mentioned: Released dexterous manipulation model in May 2026 using customized Fino hands, demonstrating ~200,000 hours of training data; showcased 20-step cooking demonstrations and fine manipulation including rotating Rubik's cubes and playing piano.

"Genesis's May release represents the current state-of-the-art performance for high-DOF dexterous hand manipulation on the market." [00:43:06.060]

Xinmojiyu (心动机缘 / Agility-like Chinese company) Chinese humanoid robot company. Why mentioned: Demonstrated postal parcel sorting in collaboration with China Post and SF Express; deployed full autonomous sorting pipeline including flipping packages, scanning barcodes — Peter states this is fully autonomous and on par with Figure AI's US demonstration.

"The information I have is that Xinmojiyu has done long-duration testing and training not only at China Post but also at SF Express... China's progress in humanoid robot deployment in logistics and industrial scenarios is absolutely not behind American companies." [00:25:39.980]

Unitree (宇树) Chinese humanoid robot company, IPO approved on China's STAR Market in Q2 2026. Why mentioned: First Chinese embodied AI company to go public; sets the valuation anchor for the entire industry; open hardware ecosystem has enabled third-party algorithm development (Sonic retargeting) that accelerated whole-body motion capture research.

"Unitree's IPO should be a landmark event for the development and investment of the entire embodied AI industry. Essentially it establishes a valuation anchor point for all leading embodied AI companies today." [01:38:24.480]

Sileo Future (西诺未来) Chinese dexterous hand company with proprietary electric cylinders and robot joints, backed by major Chinese OEMs including Li Auto and JD. Why mentioned: Released Flex2 hybrid (tendon + direct-drive) dexterous hand at ICRA — addresses tendon routing complexity by putting heavy-load motors in the forearm (tendon-driven) and fine-motion motors in the palm (direct-drive).

"Sileo Future has received investments from many large Chinese companies as strategic investors... basically all large companies doing humanoid robot research have adopted a tendon-driven approach similar to Tesla's." [00:58:39.860]

Boston Dynamics Why mentioned: Spot robot dog (thousands of units deployed) used in oil and gas industrial inspection as a deployment case for Google's Gemini Robotics ER1.6 model — reading instrument panel numbers, detecting anomalies in complex terrain.

"Boston Dynamics' Spot currently has an installed base in the thousands, genuinely deployed in industrial inspection — complex oil and gas field scenarios, going up and down stairs, checking instrument panel readings." [01:30:39.140]

Gemini Robotics (Google) Google's robotics division. Why mentioned: Released ER1.6 (Embodied Reasoning 1.6) in April 2026 — a vision-language model with enhanced spatial reasoning, used as backbone for partners including Boston Dynamics Spot and Apptronik humanoid. Google's strategy is to be the "Android of robotics."

"Google definitely wants to occupy a position that is closer to a brain, an API, a software layer." [01:30:10.520]

Hairo (海柔) Chinese warehouse automation robot company (portfolio company of Peter's). Why mentioned: Cited as an example of a previous-generation 2B robotics company that has solved standard logistics automation and is well-positioned to extend into humanoid applications for non-standard, flexible logistics scenarios.

"Companies like Hairo and XYZ that have been deeply rooted in logistics for a long time — of course they can expand in this direction." [00:26:37.480]

Pudu Robotics (普渡) Chinese robotics company, originally food delivery robots, now a leading commercial cleaning robot company. Why mentioned: Used as a positive example of a previous-generation 2B robotics company that successfully pivoted and expanded its addressable market.

"Pudu, which previously made food delivery robots, is now a top company in commercial cleaning." [00:28:05.000]

Geek+ (极智嘉) Chinese warehouse robot company, publicly listed. Why mentioned: Cited alongside Hairo and Kiva Systems as examples of successful 2B robotics companies that solved standardized logistics automation, with remaining manual workflows addressable by humanoid robots.

Kiva Systems Amazon-acquired warehouse automation company. Why mentioned: Cited as one of the most successful 2B robots historically, having solved standardized warehouse tasks — used as benchmark for what successful B2B robotics looks like.

Apptronik US humanoid robot company. Why mentioned: Listed as a Google Gemini Robotics partner for ER1.6 deployment; Peter previously noted delays in their Google development collaboration.

Liminal (蚂蚁/Ant's Limbo VA team) Ant Group's embodied AI team. Why mentioned: Cited as a data point that peak Embodied AI training compute is approximately at the 10,000-GPU-card level — not large by frontier model standards.

"Ant's Limbo VA team is probably also around the ten-thousand-GPU-card level." [01:36:26.020]

Ronda AI US world model startup. Why mentioned: Announced $450M raise during GTC — was operating in stealth for a year, now public. Cited as evidence of the world model investment boom.

DeepSeek Chinese AI lab. Why mentioned: Cited as one of the few organizations (alongside closed-source frontier labs) that could realistically win the top-tier proprietary embodied AI model position — because founder Liang Wenfeng has sufficient personal capital to fund a long-horizon research company without commercialization pressure.

"I've always thought DeepSeek is a very rare and very different kind of company — when you are successful enough and have sufficient conviction, you can fund this the way DeepMind and OpenAI were originally funded by billionaires." [01:45:17.190]

OpenAI Why mentioned: Formally announced expansion of robotics team in Q2 led by Aditya Ramesh; team grew from the Sora/DALL-E generative AI group, which OpenAI internally positioned as a world model. Peter expects OpenAI to pursue a world-model-aligned robotics approach given this lineage.

Origin Flow (原生科技) Chinese company developing a tendon-driven dexterous hand solution that fits entirely within the palm/hand (no forearm component). Why mentioned: Represents an alternative tendon-driven architecture that avoids the forearm integration challenge.

Yuanfang Weilai (原测未来) New robotics startup founded by Li Hongyang (李鸿阳). Why mentioned: Specifically founded to work on combining locomotion and manipulation — cited as evidence that loco-manipulation is becoming a dedicated research and startup category.

Manifold Space (流形空间) Chinese world model startup. Why mentioned: Listed among the fast-growing world model startups in China's current investment boom.

Inverse Matrix (逆矩阵) Chinese world model startup. Why mentioned: Listed among the fast-growing world model startups.

Pattern Star (模式星空) Chinese world model startup. Why mentioned: Listed among the fast-growing world model startups.

Jihe World (极家世界) Chinese world model company founded in 2023 — the oldest in the current cohort. Why mentioned: Described as currently the highest-valued of the Chinese world model startups; pre-dates the current boom.

Libera EH (LiberAEH) Chinese embodied AI/world model company. Why mentioned: Listed among the notable companies in the current world model investment wave.

NVIDIA GearLab NVIDIA's internal robotics research lab. Why mentioned: Produced Dream Zero (predecessor work using the WAN open-source video model), and the GearLab team leader Zhou Zhang (Korean researcher) subsequently left to found an independent world model startup in the US.

강원 (Dream Zero team / Zhou Zhang's new company) New US-based world model startup founded by former NVIDIA GearLab leader Zhou Zhang. Why mentioned: Cited as one of the new international world model companies formed in the current wave.

Anthropic Why mentioned: Cited repeatedly as the canonical example of a company that couldn't answer "how will you compete with incumbents" at founding but became a dominant player anyway — used to contextualize why inability to answer the competitive question at founding doesn't preclude success.

Sievert (竞争者) Mentioned briefly in the context of prior-generation robotics acquisitions in China at unfavorable prices.


4. People Identified

Chen Zhe (陈哲 / Peter) Founding Partner of AlphaEast. Why mentioned: Guest and primary analyst throughout; previously invested in Hairo and XYZ; attended ICRA Vienna; deep network across US and Chinese embodied AI research and startup communities. Former researcher background allows technical credibility alongside investment perspective.

Peter Florence Co-founder/researcher at Generalist. Why mentioned: Wrote a public article explicitly rejecting the "world model" and "VLA" labels for Generalist's approach, arguing their model is trained purely on native physical interaction data — a philosophically distinct position in the field.

"Peter Florence specifically wrote an article saying their approach is neither grafting actions onto a VLM to make a VLA, nor is it a world model — rather, their model is trained entirely on data native to physical interaction." [01:27:17.140]

Aditya Ramesh Leader of OpenAI Robotics team; creator of DALL-E; NYU undergraduate (no PhD). Why mentioned: His internal team was called "World Simulation Research," and the robotics team was built on the foundation of the DALL-E/Sora generative AI group — signaling OpenAI's world-model-centric approach to embodied AI.

"OpenAI's robotics team was built on the foundation of their DALL-E and Sora — image and video generation team. Ramesh's internal team name on Twitter is 'World Simulation Research.'" [01:34:01.280]

Tony Zhao / Zi-peng (子鹏) Stanford researchers who published the Aloha work in 2023 (along with Sander Adewumi). Why mentioned: Credited as originators of the Aloha teleoperation data collection paradigm that launched the modern embodied AI data collection era.

"Tony and Zipeng published the Aloha work in 2023." [00:30:55.440]

Li Fan (李帆) Co-founder of Sharpard. Why mentioned: Identified as having a long-term vision beyond dexterous hand hardware — building toward general embodied intelligence, including a three-tier model architecture and demonstrations of fine manipulation tasks like assembling pinwheels and peeling apples.

"Li Fan and his team's long-term goal is to build general embodied intelligence, entering through the hand." [00:39:40.380]

Li Hongyang (李鸿阳) Founder of Yuanfang Weilai (原测未来). Why mentioned: Specifically founded a new company dedicated to loco-manipulation — combining locomotion and manipulation in a unified training architecture — cited as evidence this has become a standalone research and startup category.

Xu Huazhe (许化哲) Chinese embodied AI founder/researcher (recently interviewed by host Manchi). Why mentioned: Holds the contrarian view — unusual in China — that not all capabilities need to be in-house and that some should be delegated to ecosystem partners. Peter characterizes this as "a very American-style idea."

"Xu Huazhe has a very non-mainstream view — he believes not everything should be done in-house; some things should be given to the ecosystem. I think this idea is probably right, and it's probably also a very American-style idea." [01:50:14.410]

Gao Shenyuan (高深远) Researcher at NVIDIA GearLab who worked on Dream Zero. Why mentioned: Peter spoke directly with him and received a detailed explanation of Dream Zero as a "world verifier" — a model that predicts how physical actions change world state — rather than a pure video generation model.

"I previously spoke with Gao Shenyuan, a GearLab researcher. He described Dream Zero as a world verifier — its input is your action, what change you want to produce, and the world model predicts what the resulting state will be." [01:10:09.520]

Liang Wenfeng (梁文峰) Founder of DeepSeek. Why mentioned: Cited as one of the rare archetypes of a founder who has sufficient personal capital and conviction to build a long-horizon research company without commercialization pressure — the model Peter sees as necessary for embodied AI but nearly impossible to replicate in China's ecosystem.

"I've always thought DeepSeek is a very rare and very different kind of company — when you are successful enough and have sufficient conviction in this work, and objectively OpenAI and DeepMind were originally billionaires' bets and investments." [01:45:17.190]

Wu Yufei (吴雨飞) Dancer from Chengdu, Sichuan. Why mentioned: Performed a live dance with eight Unitree humanoid robots on America's Got Talent, receiving unanimous judge advancement — cited alongside Figure AI's livestream as a moment that educated American mainstream audiences about humanoid robot capabilities in a visceral, non-technical way.

Elon Musk Why mentioned: Cited for his first-principles belief in tendon-driven dexterous hands (mimicking human biology); and for his claim that dexterous hands account for approximately 50% of Optimus's total engineering investment — underscoring the technical difficulty and strategic centrality of this component.

"Musk has always believed in the first-principles approach, thinking tendon-driven is closer to human biological characteristics and therefore the better direction. He's quoted saying dexterous hands may account for 50% of their entire company's engineering investment." [01:01:35.580]


5. Operating Insights

The "Liquidity Window" Framing for Hardware Startups: IPO or Lose Your Position

Peter's investment framework for pure hardware body companies is stark: the 2026 IPO window is not optional. The logic is that large OEMs (Honor, Xiaomi, Li Auto) have superior manufacturing scale and reliability track records and will enter the market in 2027. A hardware startup that has not secured its resource base through a public listing before that happens faces existential crowding.

"If you can't achieve this [IPO], then by next year — 2027 — large manufacturers like Honor with more resources, along with smartphone and auto OEMs, will have their robot bodies entering the market. And these companies have much more experience in large-scale manufacturing, reliability, and consistency." [00:09:43.960]

The F1 Racing Analogy: Use Public Competitions as Engineering Forcing Functions

Peter's reframe of the Humanoid Marathon as an "F1 for robots" is operationally useful: high-profile competitions create extreme, artificial specifications that force teams to solve reliability, thermal management, and system integration problems that no commercial RFP would ever specify. The engineering learnings (e.g., Honor's liquid cooling system) transfer directly to production.

"Like F1 racing, which was never meant to sell cars but rather to push technical boundaries, and many of those technologies gradually make their way into mass-production vehicles. The marathon created an extreme test environment — when you understand the capability boundaries of humanoid robots thoroughly, you have a large body of systematic design experience to draw on when you scale to production." [00:13:31.320]

Scene Selection as Competitive Moat for B2B Robot Deployment

Peter articulates a precise criterion for identifying good humanoid robot deployment scenes: (1) non-standard, flexible objects that traditional robot arms failed to handle; (2) high throughput and volume; (3) consistent operating environment; (4) tasks involving deformable materials or unpredictable corner cases that require generalization. The specific combination of deformable materials + bimanual operation is where new generative models create a genuine step-change over the prior generation of industrial arms.

"The most historically successful B2B robots were all in scenarios with large-scale deployment volume, sufficiently high throughput, and relatively consistent operating environments and task requirements." [00:23:13.760]

"Figure and Xinmojiyu chose this scene very cleverly, because it is genuinely a problem that old technology stacks and non-humanoid, non-dual-arm-plus-dexterous-hand solutions struggled to solve smoothly." [00:20:21.820]

The Dexterous Hand Standardization Race: Back the Company That Becomes the "Unitree of Hands"

Peter draws a direct parallel between Unitree's role in creating an open hardware ecosystem for humanoid locomotion research (enabling Sonic retargeting, third-party algorithms, etc.) and the emerging opportunity for a dexterous hand company to occupy the same role. The company that achieves low cost, high reliability, and sufficient openness will become the de facto research standard — and accumulate all the software/algorithm development as a moat.

"Fino today is positioned somewhat like Unitree — focused on producing a low-cost, high-reliability, stable hardware device that is reliable and durable enough for people to do large amounts of research and experimentation on top of it. This is why in the past one to two months we have seen very many companies — both in the US and China — releasing dexterous manipulation models or work based on Fino's dexterous hand." [00:41:07.860]


6. Overlooked Insights

The Remote Supervision Model for Industrial Robots Is the Inevitable Near-Term Architecture — And It's a Separate Business

Peter briefly mentions that industrial humanoid deployments will likely require a human supervisor managing multiple robots remotely, analogous to Waymo's remote takeover model — measured by "takeover efficiency." This is mentioned almost in passing, but it implies a completely separate B2B service layer business (robot supervision-as-a-service) that nobody in the conversation identifies as a standalone opportunity. The analogy to Robotaxi remote operations centers — which became multi-hundred-million-dollar businesses in their own right — suggests this is a significant, underdiscussed category.

"Just like the remote takeover system for Robotaxi — when we evaluate Waymo and other domestic Robotaxi companies, we measure their takeover efficiency. I think in industrial and logistics settings, you can have a person in the back watching... if it truly enters home settings, there is a privacy issue, which means higher requirements for full autonomy." [00:22:14.460]

EgoCentric Video + Algorithm Advances May Make the Dexterous Hand Data Problem Solvable Without Proprietary Hardware

Peter notes almost in passing that EMG (electromyography) signal capture — a technology that has existed for decades (e.g., the Myo armband from a University of Waterloo spinout in Canada, Coretronic's strong brain work in Hangzhou) — is suddenly attracting renewed attention because modern ML fitting/training capabilities can now extract meaningful joint-level data from EMG signals that were previously too noisy to use. This is not covered as a primary theme but represents a potentially disruptive data collection pathway: if EMG wristbands can reliably capture fine hand motor data without requiring proprietary dexterous hardware to be worn, it could break the data-hardware dependency loop Peter identifies as the core bottleneck — and it specifically benefits third-party data companies (the business model Peter otherwise says is structurally weak for dexterous hands).

"Using EMG to control dexterous hands, or using EMG to capture human hand data, is not a new idea — more than a decade ago in Canada there was a company that launched the Myo armband... The reason people are paying more attention to this recently is that with advances in algorithms, our ability to collect large amounts of data and perform model training and fitting has significantly improved. So people are very curious whether new EMG methods can more accurately reconstruct each joint's position and operation." [00:56:42.760]