PhysVLA
Physics correction middleware wrapper for Vision-Language-Action models that operates at inference time without requiring model retraining or weight access
“The core contribution is a plug-and-play wrapper that sits between a frozen VLA's predicted action and the robot's controller, applying physics corrections without touching model weights, retraining, or even accessing internal weights.”
Source→“PhysVLA adds negligible inference cost. On a single RTX 4090 the per-step overhead of Channels A+B sums to ≈ 0.6 ms”
Source→“"end-to-end placement success rises from 45% under the Baseline to 95% under PhysVLA, and mean trajectory jerk drops from ≈ 0.05 to ≈ 0.005 (~10× smoother executions)"”
Source→“The design philosophy is encoded in a blending cap: "a_t = (1−c) * a_VLA + c * a_phys, c = 0.05" (Eq. 2). The executed action is 95% the VLA's own prediction, refined by a 5% physics correction.”
Source→AI-extracted from podcast / newsletter / paper summaries. May contain errors.