Novel Algorithms for… | arXiv Physical AI Research Summary

Beker, Geist, Paulus, Martius — University of Tübingen | ICRA 2026 Workshop on Contact-Rich Control and Representation

1. Key Themes

Gradient-Based Optimization for Contact-Rich Robot Control is Blocked by Simulation Infrastructure, Not Algorithms

The paper's opening argument is direct and important: "Generating intelligent robot behavior in contact-rich settings is a research problem where zeroth-order methods currently prevail." This isn't a critique of RL researchers — it's a diagnosis of why even well-funded robotics labs can't efficiently train manipulation policies using gradient information. The bottleneck is simulation: specifically, the collision detection layer produces gradients that are either undefined, discontinuous, or statistically noisy. This paper attacks that root cause directly.

The Industry-Standard Contact Pipeline Has Three Structural Defects That Make It Incompatible with Differentiable Learning

The paper methodically catalogs why existing approaches (used by MuJoCo, PyBullet, Drake) fail for gradient-based methods: "(i) Converting each non-convex surface into a mesh... and then decomposing it into a set of convex meshes... (ii) Running standard collision detection routines like GJK+EPA or SAT... (iii) Building a contact manifold." The failure modes are: witness points are non-unique for parallel face contact (the most common stable configuration in manipulation), polygon clipping routines are non-differentiable due to branching, and broad-phase filtering creates gradient discontinuities as primitive pairs activate/deactivate (Section I). These aren't fixable by patching existing code — they require architectural replacement.

XPSQ: A New Geometry Primitive That Represents Complex Shapes with Dramatically Fewer Primitives

The paper introduces the "extruded plane-superquadric intersection" (XPSQ) primitive, which can represent a cup handle — an inherently non-convex geometry — with a single primitive by sweeping a shape along a spline. The practical implication: "a cup can be represented with only 3 XPSQ primitives... In contrast, convex decomposition using CoACD results in 36 primitives" (Section III). Fewer primitives means fewer pairwise collision checks, smoother gradients, and faster computation. This is the geometry representation layer that differentiable simulators have been missing.

Sphere Tracing as a Vectorizable Replacement for GJK/EPA in Contact Detection

Rather than patching GJK (which requires iterative solvers that are expensive to differentiate and poorly suited to GPU batching), the paper proposes sphere tracing for edge-SDF intersection. Critically: "We have empirically found three iterations to be more than sufficient for convergence" (Section II-C). Three fixed iterations means the computation graph is constant-depth, JIT-compilable, and trivially vectorizable across thousands of contact pairs in parallel — properties GJK-based methods cannot match.

An Order-of-Magnitude Speed Improvement Over MJX at Scale

The computational efficiency benchmark (Figure 5) shows the proposed routine achieves roughly 10x speedup over MJX (MuJoCo's JAX-based differentiable backend) for contact manifold generation, on collisions between non-convex armadillo geometries across 100 random configurations. The paper notes the logarithmic scale and adds: "Another order of magnitude improvement can potentially be gained by employing any existing differentiable broad-phase routine for edge filtering" (Section III). That implies a potential 100x improvement pathway over the current state of the art in differentiable simulation throughput.

2. Contrarian Perspectives

Making Existing Collision Pipelines Differentiable is a Dead End — You Have to Rebuild from Scratch

The dominant approach in differentiable simulation research has been to take GJK/EPA (the industry-standard collision algorithm) and add smooth gradients on top. Two well-cited papers — Tracy et al.'s analytical smoothing via interior-point methods and Montaut et al.'s randomized smoothing via score function estimators — both operate in this paradigm. This paper explicitly argues that approach is fundamentally flawed: "we opt for this approach rather than trying to make existing routines differentiable, because these standard routines were conceived within a computer graphics, video game design, or computational science and engineering context with speed and minimal memory footprint as the primary concern (rather than differentiability and vectorization)" (Section I). The implication for companies building on MuJoCo, Drake, or PyBullet: incremental differentiation of those simulators won't get you to first-order training efficiency.

Convex Decomposition Is the Wrong Geometric Foundation for Robot Learning

The robotics industry has broadly standardized on convex decomposition (e.g., CoACD) for collision geometry. This paper argues that for differentiable simulation, this creates an irreconcilable problem: with N and M convex primitives per object, "N×M convex-convex checks are required" and "the broad-phase routine can still create jumps in the gradient (due to primitive pairs activating and deactivating)" (Section I). A cup requiring 36 convex primitives vs. 3 XPSQs isn't just an aesthetic difference — it's a 12x multiplication of pairwise checks and 12x more opportunities for gradient discontinuities. Companies using learned shape representations or neural implicit surfaces for manipulation should take note: the geometry representation choice is not downstream of the learning problem, it is load-bearing infrastructure.

Contact Manifold Instability Is a Geometry Problem, Not a Dynamics Problem

Most efforts to stabilize contact-rich simulation focus on contact dynamics solvers (spring-damper models, LCP formulations, compliant contact). This paper's Figure 4 comparison against MuJoCo's polygon clipping shows that even with a correct dynamics solver, non-smooth contact point generation causes instability: "polygon clipping results in non-smooth behavior of the contact points, whereas the proposed edge-SDF contact points adapt smoothly to the surface" (Section III). The proposed routine is described as "a middle ground between polygon clipping and the hydroelastic contact model, without needing to explicitly re-mesh the intersection volume." If your simulation is unstable during box-on-plane contact — the simplest possible manipulation setup — the problem may be upstream of your dynamics model.

3. Companies Identified

MuJoCo / Google DeepMind The simulator most directly benchmarked against. The paper compares against both MuJoCo's standard polygon clipping routine and MJX (MuJoCo's JAX-based GPU-accelerated backend). The efficiency benchmark in Figure 5 shows approximately 10x throughput improvement over MJX for contact manifold generation. MuJoCo is cited as one of "all commonly used (non-differentiable) simulators in robotics" (Section I, ref [30]). This represents a direct competitive challenge to MJX as the preferred differentiable simulation backend for robot learning research.

PyBullet / Erwin Coumans Cited alongside MuJoCo and Drake as one of the de-facto standard simulators using convex primitive decomposition (Section I, ref [5]). Not directly benchmarked but implicitly affected by the paper's argument that the entire class of GJK-based simulators has structural gradient pathologies.

Drake / Toyota Research Institute (Tedrake) Cited as part of the standard non-differentiable simulator ecosystem (Section I, ref [29]). Drake has invested heavily in hydroelastic contact modeling, which the paper references as an alternative to polygon clipping — but notes its own limitations: the hydroelastic approach requires "explicitly re-mesh[ing] the intersection volume... which is a non-differentiable and non-vectorizable routine" (Section III).

CoACD (Approximate Convex Decomposition) The geometry preprocessing tool whose output (36 primitives for a cup) is directly compared to the proposed method's 3-primitive representation. "Convex decomposition using CoACD results in 36 primitives, which, in the absence of a differentiable broad-phase routine, is highly inefficient" (Section III, ref [34]). CoACD is widely used in simulation pipelines; this paper argues its output format is incompatible with efficient differentiable simulation.

4. People Identified

Onur Beker — University of Tübingen (Martius Lab) Lead author and primary architect of the XPSQ primitive and sphere-tracing contact routine. This is part of a sustained research program: the paper cites two prior works by Beker et al. ([2] arXiv:2602.20304 and [3] IROS 2025) building toward differentiable rigid body simulation. He is emerging as a central figure in the differentiable simulation subfield focused on contact-rich manipulation.

Georg Martius — University of Tübingen Principal investigator and senior author. Martius's lab has been consistently productive at the intersection of differentiable simulation, robot learning, and optimization. His group's SoftJAX library (ref [24]) provides the smooth operator approximations (sigmoid, softplus, logsumexp) that underpin the entire framework — indicating this is a vertically integrated research program, not a one-off paper.

Anselm Paulus — University of Tübingen Co-author and co-developer of the SoftJAX library. Also first author on a concurrent paper cited here: "Hard contacts with soft gradients: refining differentiable simulators for learning and control" (arXiv:2506.14186, ref [23]), which addresses polygon clipping differentiability. Paulus is working on complementary pieces of the same puzzle — contact dynamics and integration — while this paper addresses collision detection.

Kevin Tracy — Carnegie Mellon University (Manchester Lab) Not a co-author, but his work on differentiable collision detection via interior-point methods (ICRA 2023, ref [31]) is the primary prior art the paper positions against. Tracy's approach is acknowledged as robust for convex primitives but critiqued as architecturally limited for the manifold generation problem. His work on primal-dual interior-point differentiability (ref [32]) is directly used in one component of this paper's method.

Louis Montaut — INRIA / LAAS-CNRS Author of the randomized smoothing approach to differentiable collision detection (ICRA 2023, ref [19]) and the HPP-FCL/Coal collision library. His approach represents the other major competing paradigm that this paper argues is insufficient for the full manifold generation problem.

Sylvain Calinon — IDIAP Research Institute Collaborator appearing on two of the cited prior works by the same group (refs [2] and [3]). His group's work on movement primitives and distance fields (ref [17], Li & Calinon, RA-L 2025) directly informs the XPSQ spline projection method. Calinon's group bridges robot motion planning and differential geometry in ways relevant to this framework.

5. Operating Insights

If You're Training Manipulation Policies in Simulation, Your Bottleneck May Be Collision Detection — Not Your Learning Algorithm

The paper's central claim is operational: zeroth-order methods (evolutionary strategies, finite differences, model-free RL) dominate contact-rich robot learning not because they're better algorithms, but because first-order methods can't get clean gradients through simulation. The specific failure point is collision detection. If your team is scaling up GPU-parallelized RL for dexterous manipulation and hitting diminishing returns on sample efficiency, the fix may not be a better policy architecture or reward function — it may be switching to a differentiable simulator with a properly differentiable contact pipeline. The 10x throughput improvement over MJX demonstrated here (Figure 5, Section III), with a potential additional 10x from broad-phase filtering, means gradient-based trajectory optimization for contact-rich tasks could become computationally competitive with model-free RL within the next 12-24 months.

Geometry Representation for Collision Is a First-Class Engineering Decision, Not a Preprocessing Afterthought

Most robotics engineering pipelines treat collision geometry as a solved problem: run CoACD on your meshes, get convex hulls, done. This paper quantifies the cost of that assumption: a cup requires 36 convex primitives vs. 3 XPSQs, with each additional primitive multiplying pairwise collision checks and introducing potential gradient discontinuities. For teams building simulation infrastructure for robot learning — especially for manipulation with complex tool geometries (cups, cables, hinged objects) — the choice of geometry representation now has measurable impact on training throughput and gradient quality. CTOs evaluating simulation stack investments should treat geometry representation as a load-bearing architectural decision alongside physics solver choice.

6. Overlooked Insights

The Lack of Automatic Mesh-to-XPSQ Decomposition Is the Paper's Critical Deployment Gap — and an Open Research Prize

The paper's honest self-assessment buries a significant limitation: "The main limitation of the proposed method is the lack of an established routine to automatically decompose a mesh into XPSQ primitives, which we leave to future work" (Section IV). In practice, this means the method currently requires manual or semi-manual geometry authoring to use XPSQ representations. Every robot manipulation task involves arbitrary object geometries from CAD files or scans — without automated decomposition, the 10x speedup is inaccessible for production pipelines. This is both the paper's biggest practical gap and a clear research/engineering opportunity: whoever builds a robust mesh-to-XPSQ pipeline (analogous to what CoACD does for convex decomposition) unlocks this entire framework for real deployment. It is the missing link between this academic result and a usable simulation backend.

Sphere Tracing's Three-Iteration Convergence Claim Is Empirical, Not Proven — and Convergence Fails for High-Tessellation Meshes

The paper states: "We have empirically found three iterations to be more than sufficient for convergence" (Section II-C) for sphere tracing, and then adds a constraint that is easy to miss: the routine "is guaranteed to succeed only in cases when an edge penetrates the SDF only once or never. We argue that this is sufficient, because if there exists an edge that penetrates an SDF more than once, this indicates that the mesh tessellation is not at a density adequate for the complexity of the surfaces involved" (Section II-C). This is a circular assumption — the method works when mesh density is appropriate, and mesh density is appropriate when the method works. For teams dealing with real robot geometries from CAD files (which can have arbitrarily fine or coarse tessellation in specific regions), this is a hidden failure mode. Engineering teams integrating this approach will need to implement tessellation validation or adaptive remeshing as a preprocessing guard, which adds pipeline complexity that the paper does not address.