Eight tasks.
Eight calibrated manipulation tasks — each mapped to academic dimensions and graded on real hardware under VLATest perturbation operators.
16-camera capture
120 Hz proprio
Cluttered Pantry
Retrieve a specified item from a densely populated shelf containing visually similar distractors. Adversarial confounders, paraphrased instructions, drifting lighting between rollouts.
Translate-and-Fold
Receive a paraphrased natural-language instruction (5 mutations per rollout, mixed languages) and fold a deformable object accordingly. Probes language grounding and instruction robustness.
Cafeteria Tray
Plan and execute a 9-step assembly: open dispensers, portion, plate, garnish, serve. Composite long-horizon — maps directly to VLABench Track-6 (where SR for π0-Fast peaks at 1.6%).
Disturbed Pour
Pour granular media into a vessel while an adversarial human nudges the workspace. Online replanning, force-aware control, recovery from contact failure.
OOD Glassware
Manipulate fragile transparent items absent from training distribution. Camera-pose, lighting, and object-mesh perturbations sweep the VLATest fuzzing operators.
Partner-Handoff
Two arms — one human, one robotic — collaborate to pass and assemble pieces. Intent inference, timing, dynamic affordance, safety envelope under shared workspace.
Map-Free Navigation
Navigate an unmapped office to fetch an object specified only by referring expression ("the blue mug Maria left near the window"). Spatial language + memory-driven exploration.
Tool-Improvise
Intended tool is missing. Reach the goal using a non-canonical substitute (a ruler in place of a spatula). Tests creative reuse and physical-law reasoning.