Yijie Zhu
Yijie Zhu is a graduate researcher at Harbin Institute of Technology, Shenzhen, also affiliated with Great Bay University in Dongguan, China. He is best known as the lead author of ΔVLA, a prior-guided vision-language-action framework for robotic manipulation published on arXiv in March 2026, which models world-knowledge variations relative to an explicit current-world prior to improve long-horizon task performance. His research centers on multimodal large language models and embodied AI, with a focus on unifying perception, reasoning, and control for real-world robot systems.
“Instead of predying what the future looks like, ΔVLA predicts how the world changes — and that shift in framing delivers state-of-the-art manipulation performance at 3x faster training speed than comparable approaches.”
Source→“The quality of an action is determined by the variation it induces rather than the absolute future state... modeling variation has long been a standard technique in many areas, as emphasizing differences can stabilize prediction and highlight transitions.”
Source→“ΔVLA attains an average success rate of 72% on Galaxea R1 Lite and 69% on AgileX Cobot Magic... DreamVLA: 53% and 49% respectively.”
Source→“For simulation, we build on OpenVLA as the backbone... fine-tuned using Low-Rank Adaptation (LoRA) with rank 32.”
Source→“Inspired by Genie, we propose the Latent World Variation Quantization (LWVQ) module to encode world-knowledge variations in a fully unsupervised manner.”
Source→“π₀ [RSS'25]: 94.2% LIBERO average, 67.4% RoboTwin 2.0 average”
Source→AI-extracted from podcast / newsletter / paper summaries. May contain errors.