§ 01 · Leaderboard

Leaderboard.

Vision-language-action models evaluated on calibrated physical benches. Sorted by ARC composite, continuously re-tested.

Last sync 02:14 UTC
CI 95% reported
Bench v0.4.2

ModelArchitecture

Helix-2Figure · 8.4 B · 2026·02

Hierarchical S2/S1 VLA58.418.247.171.0

π0.5Physical Intelligence · 3.3 B · 2025·11

Flow-matching VLA54.116.444.064.5

Gemini-Robotics 1.5Google DeepMind · — · 2025·10

Multimodal VLA62.911.241.556.0

GR00T N1.5NVIDIA · 12 B · 2025·09

Dual-system VLA51.013.639.860.2

RoboBrain-2.0BAAI · 7 B · 2025·08

Embodied-finetuned MLLM + π adapter56.89.636.248.0

π0-FastPhysical Intelligence · 3.0 B · 2025·05

Discretized flow VLA47.29.235.752.4

Showing 1–6 of 9