Dexmal is a robotics AI research and development company specializing in low-latency inference optimization for large Vision-Language-Action (VLA) models, enabling real-time robot deployment on consumer hardware. Their flagship software frameworks, including Realtime-VLA and FLASH, allow state-of-the-art foundation models to run at speeds sufficient for dynamic robotic manipulation tasks — such as 30Hz VLA inference with sub-200ms reaction times — without retraining or new hardware.
“FLASH is a software-level framework that makes the same model run 3x faster at inference time, with almost no degradation in task success — no retraining the main model, no new hardware.”
Source→“FLASH cuts this to an average of 19.1 ms (3.04× speedup) without touching the main model weights.”
Source→AI-extracted from podcast / newsletter / paper summaries. May contain errors.