Yiyang Du
Yiyang Du is a PhD researcher at Carnegie Mellon University's Language Technologies Institute whose work focuses on vision-language models and embodied AI. He is best known for EmbodiedMidtrain, a mid-training pipeline that bridges the distribution gap between Vision-Language Models and Vision-Language-Action Models by selecting VLM samples aligned with VLA data distributions. He has also contributed to research on model composition for multimodal large language models, published at ACL 2024. His collaborative work spans institutions including Bosch Research North America and the Bosch Center for Artificial Intelligence.
“EmbodiedMidtrain inserts a lightweight 'alignment' step between VLM pretraining and VLA fine-tuning that costs a fraction of normal training compute but consistently delivers performance competitive with models 3–8x larger.”
Source→“MMD distances are generally smaller within the VLM group and within the VLA group than across the two groups, quantitatively confirming a clear distributional mismatch (Section 3).”
Source→AI-extracted from podcast / newsletter / paper summaries. May contain errors.