frax: Fast Robot… | arXiv Physical AI Research Summary

Stanford University | Morton & Pavone | arXiv 2604.04310 | April 2025

1. Key Themes

One Codebase, Three Hardware Targets — Without the Performance Tax

The fundamental problem frax solves is the forced choice between fast CPU execution and GPU scalability. Today, teams typically maintain separate codebases: optimized C++ for real-time control loops and Python/JAX for training pipelines. frax collapses this into a single pure-Python library that runs competitively on both.

"While existing libraries often excel at either low-latency CPU execution or high-throughput GPU workloads, few provide a unified framework that targets multiple architectures without compromising performance or ease-of-use." (Abstract)

The practical result: a differential IK controller on a Franka Panda runs in 6 microseconds on CPU, and the same code scales to 100 million dynamics evaluations per second on GPU. (Table II; Section III-B)

Python Speed Approaching C++ — A Genuine Engineering Breakthrough

The conventional wisdom in robotics infrastructure is that Python is for prototyping and C++ is for deployment. frax challenges this directly. By using JAX's JIT compilation, compiler fusion, and multithreaded linear algebra, it closes most of the gap with compiled C++.

"In comparison to the Python APIs for Pinocchio and MuJoCo, frax fares very well, with around a 2-3x speedup... For the best possible performance on CPU, a controller written in C++ and compiled directly with the C++ interfaces to Pinocchio and MuJoCo will beat out frax, but this gap can be quite close, as seen in the G1 OSC timings." (Section V)

For humanoid control specifically — the G1 operational space controller runs at 42 microseconds in Python with frax — this is fast enough for kilohertz-rate deployment on real hardware, as the authors demonstrate in prior work (citation [24], Morton & Pavone IROS 2025).

Automatic Differentiation Through Dynamics — The Missing Piece for Modern Robot Learning

Most dynamics libraries treat AD as an afterthought. frax makes it native. This matters enormously for model-predictive control, trajectory optimization, and safety-critical control (CBFs) where you need gradients of complex kinematic/dynamic expressions.

"Integration with JAX allows users to automatically differentiate through complex functions of the robot kinematics and dynamics... The Lie derivatives required to construct the CBF constraint... is itself a JVP, and thus dh/dz does not need to be computed in full. This can be particularly beneficial for high-DOF systems like the Unitree G1." (Sections IV, IV-A)

The vectorized formulation also dramatically reduces JIT compilation times: 1-2 seconds for frax versus 6-12 seconds for MJX/BRAX — a 5-6x improvement that directly reduces iteration time during controller development. (Section V)

Vectorization as Architectural Philosophy — Not Just an Optimization

The authors made a deliberate algorithmic tradeoff: increase computational complexity from O(n) to O(n²) to eliminate recursive loops. This is counterintuitive but pays off across every target use case.

"By avoiding a recursive loop-based structure, we intentionally increase the complexity of RNEA from O(n) to O(n²)... However, the overhead of redundant O(n²) computations is vastly outweighed by the performance gains from the XLA compiler's ability to make use of fine-grained parallelism within the CPU, GPU, or TPU." (Section III-A)

The evidence is stark in Table IV: loop-based CRBA on the G1 takes 118 microseconds vs. 9.5 microseconds for vectorized CRBA on CPU. On GPU, the gap explodes to 1,149 microseconds vs. 26 microseconds — a 44x difference.

2. Contrarian Perspectives

Simulation Environments Are Not the Right Tool for Controller Design

The robotics community has largely converged on GPU-accelerated simulators (Isaac Lab, MJX, BRAX) as the primary training infrastructure. frax argues this is the wrong abstraction for a large class of controller design problems.

"Existing JAX libraries for robot dynamics (MJX, BRAX) tend to optimize for GPU performance, leading to limited applicability for single-robot control and planning on CPU. This is especially noticeable for inverse dynamics methods on the G1, which can lead to significant spikes in controller compute times (on the order of 1ms as opposed to 40μs with frax)." (Section V)

A 25x latency spike in your controller from using the wrong library is not a benchmark curiosity — it's the difference between a working real-time system and one that misses control deadlines. The contrarian claim: physics simulators and dynamics libraries are fundamentally different tools, and conflating them costs real performance.

The Python-to-C++ Rewrite for Deployment Is Becoming an Unnecessary Tax

The standard robotics engineering workflow — prototype in Python, rewrite in C++ for production — is increasingly unjustified. frax demonstrates that a pure-Python library can achieve kilohertz-rate control on real hardware.

"frax provides a much more flexible Python-based AD-compatible interface that can actually be deployed on hardware for kilohertz-rate control (as in [24])." (Section VI-A)

This has direct organizational implications: teams spending 6-12 months rewriting controllers in C++ for deployment may be incurring unnecessary engineering cost. The counterargument the paper acknowledges is that C++ Pinocchio/MuJoCo still wins on raw CPU speed — but the margin is narrow enough that for most applications, the Python flexibility wins.

Recursive Algorithms Are the Wrong Primitive for Modern Hardware

Classical robotics has treated Featherstone's recursive algorithms (RNEA, ABA, CRBA) as the performance gold standard for decades. frax argues that on modern hardware — where XLA compilers, SIMD units, and GPU tensor cores dominate — vectorized formulations outperform recursive ones despite their higher theoretical complexity.

"While CPUs can efficiently handle tree traversals, GPUs favor operations with fewer sequential dependencies; thus, we take this approach to maintain strong performance across platforms. And, when applying automatic differentiation to the robot dynamics, the vectorized form avoids the cost of tracing gradients through long unrolled loops, for fast evaluation and compilation." (Section III-A)

The data from Table IV validates this: the JIT compilation time for loop-based CRBA on the G1 with GPU JVP is 9.668 seconds versus 0.490 seconds for the vectorized version — nearly a 20x difference in compile time. In iterative development, this matters as much as runtime.

3. Companies Identified

Franka Robotics

Description: German robotics company, maker of the Panda 7-DOF research manipulator
Why relevant: The Franka Panda is one of two primary validation platforms for frax, with native support including tuned spherized collision models
Quote: "We validate performance on a Franka Panda manipulator... frax has native support for the Franka Panda and Unitree G1, including tuned spherized collision models and self-collision pairs for both robots." (Sections Abstract, IV)

Unitree Robotics

Description: Chinese robotics company, maker of the G1 humanoid and Go-series quadrupeds
Why relevant: The G1 humanoid is the second primary validation platform; frax's 42μs operational space control timing on the G1 makes it directly relevant for Unitree-based humanoid deployments
Quote: "Inverse dynamics [OSC] — Unitree G1: 42.205 μs" (Table II)

Google DeepMind (MuJoCo / MJX)

Description: AI research lab; MuJoCo is the dominant physics simulator for robot learning, MJX is its JAX-accelerated GPU variant
Why relevant: Direct competitive comparison — frax matches MJX on GPU performance but significantly outperforms it on single-robot CPU control (1ms vs. 40μs on G1 ID tasks) and compiles 5-6x faster
Quote: "frax is on par (or slightly faster) than MJX or BRAX... frax compiles in 1-2 seconds versus 6-12 for MJX/BRAX." (Section V)

Google (BRAX)

Description: Google's differentiable physics engine for large-scale rigid body simulation, built on JAX
Why relevant: Direct GPU-performance benchmark competitor; frax matches BRAX throughput while offering superior CPU single-robot performance
Quote: "On GPU, frax is on par (or slightly faster) than MJX or BRAX." (Section V)

INRIA / Willow (Pinocchio)

Description: Open-source C++ rigid body dynamics library, widely considered the fastest CPU dynamics library in robotics
Why relevant: The primary performance benchmark for frax on CPU; frax's Python interface beats Pinocchio's Python bindings 2-3x while approaching raw C++ Pinocchio performance
Quote: "Pinocchio [is] typically noted as the fastest... In comparison to the Python APIs for Pinocchio and MuJoCo, frax fares very well, with around a 2-3x speedup." (Sections I, V)

NVIDIA (Isaac Lab)

Description: GPU-accelerated simulation framework for robot learning
Why relevant: Referenced as the leading GPU simulation platform; frax positions itself as complementary (low-level controller within Isaac Lab environments) rather than competitive
Quote: Referenced in the GPU simulation context (Reference [9]); "frax can complement these environments (e.g. a low-level controller for an end-effector action space)." (Section VI-A)

cuRobo (NVIDIA)

Description: NVIDIA's parallelized collision-free robot motion generation library
Why relevant: Referenced as a leading GPU-accelerated motion planning library; represents the GPU-specialized approach that frax bridges with CPU control
Quote: Referenced as [13]: "Sundaralingam et al. [2023] cuRobo: Parallelized collision-free minimum-jerk robot motion generation." (References)

4. People Identified

Daniel Morton

Lab/Institution: Stanford University, Departments of Mechanical Engineering and Aeronautics & Astronautics; NASA Space Technology Graduate Research Opportunity recipient
Why notable: Primary architect of frax; also lead author on the prior IROS 2025 paper demonstrating kilohertz-rate CBF-based safe control on real hardware — the existence proof that this library works outside the lab
Quote: "Daniel Morton was supported by a NASA Space Technology Graduate Research Opportunity... Safe, task-consistent manipulation with operational space control barrier functions. IROS 2025." (Author notes; Reference [24])

Marco Pavone

Lab/Institution: Stanford University, Departments of Mechanical Engineering and Aeronautics & Astronautics
Why notable: Senior author; one of the leading academic figures in safe autonomy, robot planning, and learning-based control. His lab's focus on bridging planning, control, and learning makes frax a strategic infrastructure piece for a broader research agenda
Quote: "Daniel Morton and Marco Pavone are with the Departments of Mechanical Engineering and Aeronautics & Astronautics, Stanford University." (Author affiliation)

Justin Carpentier

Lab/Institution: INRIA (French National Institute for Research in Digital Science and Technology)
Why notable: Lead developer of Pinocchio, the primary CPU benchmark competitor; also co-authored foundational work on analytical derivatives of rigid body dynamics. His work sets the performance bar frax is measured against
Quote: "Carpentier et al. [2019] The pinocchio c++ library – a fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives." (Reference [1])

Roy Featherstone

Lab/Institution: Independent researcher (formerly ANU, IIT)
Why notable: Author of the foundational rigid body dynamics algorithms (CRBA, RNEA, ABA) that frax implements in vectorized form. Understanding frax requires understanding Featherstone's spatial algebra formulation
Quote: "Rigid body dynamics algorithms are commonly based on Featherstone's formulation... namely, the Composite Rigid Body Algorithm (CRBA), the Recursive Newton-Euler Algorithm (RNEA), and the Articulated Body Algorithm (ABA)." (Section II)

Brian Plancher

Lab/Institution: Harvard / Columbia (referenced work from MIT/Harvard collaboration)
Why notable: Led key prior work on GPU-accelerated rigid body dynamics gradients (GRiD), which frax builds upon conceptually for its AD approach
Quote: "Plancher et al. [2022] GRiD: GPU-accelerated rigid body dynamics with analytical gradients." (Reference [20])

5. Operating Insights

For Real-Time Humanoid Control, Library Choice Is a Systems-Level Decision

If you are deploying a humanoid (Unitree G1, or any high-DOF platform) and using MJX or BRAX for your controller — not just your training environment — you may be leaving significant latency on the table. The paper shows inverse dynamics compute times of ~1ms with MJX versus ~42μs with frax on the G1 in operational space control.

"This is especially noticeable for inverse dynamics methods on the G1, which can lead to significant spikes in controller compute times (on the order of 1ms as opposed to 40μs with frax)." (Section V)

At kilohertz control rates, 1ms per dynamics call means your controller is already over budget before you've done anything else. Engineering teams evaluating humanoid stacks should benchmark their dynamics library independently of their simulator. The right tool for simulation-based training is not necessarily the right tool for the deployed control loop.

Automatic Differentiation Through Dynamics Unlocks Safety-Critical Control Without C++ Expertise

Control Barrier Functions, MPC, and trajectory optimization all require gradients through robot dynamics. The traditional path — analytical derivatives coded in C++ — requires significant expertise and maintenance burden. frax makes this accessible in Python with performance competitive for real deployment.

"Via JAX, frax enables fast robot control and planning with automatic differentiation through arbitrary functions of the kinematics and dynamics. Shown above (Franka Panda): collision and singularity avoidance in an optimization-based inverse dynamics controller, fully through jax.jvp." (Figure 1 caption)

For companies building safety layers on top of learned policies — a rapidly growing segment — frax provides a practical path to CBF-based safety filters without the C++ rewrite. The IROS 2025 paper (Morton & Pavone) provides the existence proof this works on real hardware.

6. Overlooked Insights

JIT Compilation Time Is the Hidden Bottleneck in Robot Learning Infrastructure

The paper buries a finding that has major implications for training pipeline throughput: frax compiles in 1-2 seconds versus 6-12 seconds for MJX/BRAX. This 5-10x difference in compilation time is dismissed as "of much lower importance than hot calls," but this understates the operational impact.

"frax has the additional benefit of significantly reduced JIT compilation times — while JIT times are of much lower importance than 'hot' calls in the control loop, for these tests, frax compiles in 1-2 seconds versus 6-12 for MJX/BRAX. For iterative development and tuning, this speedup can significantly reduce friction for designers." (Section V)

In practice, when you are running hyperparameter sweeps, debugging controllers, or iterating on reward functions across hundreds of experiments, a 10-second compile overhead per experiment adds up to hours of wall-clock time. Teams building large-scale robot learning infrastructure should treat compilation time as a first-class metric alongside throughput. The vectorized formulation's advantage in JIT time (seen in Table IV, where loop-based CRBA GPU JVP compiles in 9.668s vs. 0.490s for vectorized) is a direct consequence of architectural choices — not incidental.

The Ancestor Mask Is a General Architectural Pattern With Broader Implications

The paper's core algorithmic innovation — replacing recursive tree traversal with a precomputed ancestor mask matrix U — is presented as an implementation detail of frax. But it represents a more general design principle: encode graph structure as static tensor masks, then execute as pure array operations. This is directly analogous to attention masks in transformers.

"The main difference between frax's vectorized approach and traditional recursive approaches comes from the use of an ancestor mask U ∈ {0,1}^(n×n) to encode the tree structure... This construction allows for summations up and down the tree to be performed with matrix multiplication." (Section III)

The implication: this pattern could be applied to contact-rich dynamics, deformable object simulation, or multi-robot kinematic chains — any domain where the underlying structure is a tree or DAG. Teams building custom dynamics engines or physics-informed neural networks for non-standard robot morphologies should study this formulation closely, as it directly enables hardware-accelerated computation on topologies that would otherwise require custom recursive kernels.