Jianke Zhang
Jianke Zhang is a researcher at the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University, where he works on robot learning, vision-language models, and multimodal learning. He is best known for his work on Vision-Language-Action models, including UP-VLA, a unified understanding and prediction model for embodied agents, and VLM4VLA, an empirical framework for benchmarking vision-language models as backbones for robotic policies. His research has been published at major machine learning and robotics conferences including ICML, NeurIPS, CoRL, and ICLR.
“we initialize a VLA from the resulting VLMs and fine-tune it following the VLA training pipeline of VLM4VLA (Section 5.1). Understanding EmbodiedMidtrain requires understanding VLM4VLA — they are tightly coupled.”
Source→AI-extracted from podcast / newsletter / paper summaries. May contain errors.