Gao Shenyuan
Shenyuan Gao is a final-year PhD student at the Hong Kong University of Science and Technology (HKUST) who completed a research scientist internship at NVIDIA's GEAR Lab under Jim Fan and Yuke Zhu. He is best known as co-first author of DreamDojo, a generalist robot world model pretrained on 44,000 hours of egocentric human video and accepted at ICML 2026, which introduces continuous latent actions to transfer interaction knowledge from unlabeled video to robot control. His broader research focuses on scaling foundation world models for general-purpose robots, and he is on the academic job market for mid-2026.
“Now in this loop there are three parts: a general Agent Policy, then the World Model... everyone is pushing toward generalization. So I think at some future point — I think it might happen this year — once this loop connects and the error accumulation reaches an acceptable level, the whole loop will become simpler and simpler, like achieving self-evolution.”
Source→“Dream Dojo is a relatively universal world model pretrain. We open-source it so that anyone with a new robot can quickly connect to our world model, fine-tune it, and use it.”
Source→“Final-year PhD at HKUST, joining NVIDIA GEAR Lab full-time. Co-first author of DreamDojo and DreamZero.”
Source→“The most impressive and the one I most want to follow is Google DeepMind. They very typically push everything to align with foundation models. Your agent aligns with Gemini, your VLA aligns with Gemini, your world model starts from Veo... always aligning action/decision data toward the most data-rich modalities.”
Source→“Hassabis — I really believe in his framework. His thinking aligns closely with mine: a world model in video space, a general agent called SIMA, and together they form a self-evolving loop.”
Source→“OpenAI's Sora team was restructured under the Robotics Lab. So I think this year will be quite competitive. It seems they're seriously working on producing something in world models.”
Source→“Professor Li Fei-Fei's World Labs — they may be more focused on games. Using explicit 3D representations for games has advantages, including possibly for autonomous driving. But for robots, video is probably better.”
Source→“Jim Fan and Yuke Zhu's research taste and style matched mine quite well. At the time I also really wanted to collaborate with them.”
Source→AI-extracted from podcast / newsletter / paper summaries. May contain errors.