Why Parameter Sweeps Are the Secret to Sim-to-Real Robotics
Boston Dynamics runs approximately 150 million simulations per maneuver for Atlas. Figure AI trains across more than 200,000 parallel environments. Agility Robotics accumulates "decades of simulated time over three or four days." The common thread? Massive parameter sweeps. Here's why the best robotics teams in the world treat simulation scale as a first-class engineering problem.
The Reality Gap Problem
Every robotics engineer who has trained an RL policy in simulation and deployed it on hardware has felt this: the policy that looked perfect in sim stumbles, oscillates, or falls over in the real world.
This is the sim-to-real gap — the performance difference between simulation and reality. It arises from physics models that simplify contact dynamics, sensors that behave differently under real lighting, and actuators that don't respond exactly as modeled.
As Salvato et al. (2021) put it in their IEEE survey: "The most undesirable result occurs when the controller learnt in simulation fails the task on the real robot, thus resulting in an unsuccessful sim-to-real transfer."
A policy trained in one perfect simulation environment will learn to exploit that environment's specific quirks — quirks that don't exist on the real robot. The solution isn't a better simulator. It's a broader one.
Domain Randomization: The Industry Standard
In 2017, Tobin et al. at OpenAI introduced a deceptively simple idea: instead of trying to make simulation perfectly realistic, make it randomly unrealistic in many different ways. Vary friction, mass, lighting, textures, actuator delays — everything. Train across all of it.
The key insight: "With enough variability in the simulator, the real world may appear to the model as just another variation."
This technique — domain randomization — has become the dominant approach for sim-to-real transfer in robotics RL. It works because a policy trained across thousands of parameter variations can't rely on any single simulator quirk. It has to learn the underlying physics.
NVIDIA demonstrated this concretely: by applying structured domain randomization to door detection training in Isaac Sim, model accuracy on real-world images jumped from 5% to 87% — using only synthetic training data. (NVIDIA Developer Blog)
What the Best Teams Actually Do
Domain randomization sounds straightforward on paper. In practice, it requires running training at enormous scale. Here's what leading robotics companies are doing:
Boston Dynamics (Atlas)
Each Atlas maneuver is honed with data from approximately 150 million simulation runs. Multiple concurrent Atlas instances run in parallel, and policies transfer zero-shot to the 90kg physical robot. Their internal Spot robustness fleet operates 24/7, logging over 2,000 hours of testing per week.
Figure AI (Figure 02)
Figure's S0 whole-body control policy trains across more than 200,000 parallel environments with extensive domain randomization, transferring zero-shot to real hardware. A 10-million-parameter neural network replaced 100,000 lines of hand-written control code. Their key framing: "collecting years' worth of simulated demonstrations in a few hours."
Agility Robotics (Digit)
Agility trains Digit's whole-body control foundation model — under 1 million parameters — "for decades of simulated time over three or four days." They cross-validate policies in containerized MuJoCo to expose corner cases that a single simulator might miss.
ETH Zurich (ANYmal)
Rudin et al. (2022) showed that massively parallel simulation on a single GPU could train a quadruped to walk on flat terrain in under 4 minutes and rough terrain in 20 minutes. This paper established the recipe now used across the industry: PPO + thousands of parallel environments + domain randomization.
| Company | Scale | Approach |
|---|---|---|
| Boston Dynamics | ~150M sims per maneuver | Massive parallel sweep, zero-shot transfer |
| Figure AI | 200K+ parallel environments | Domain randomization, replaced hand-coded control |
| Agility Robotics | Decades of sim-time in days | Foundation model + cross-simulator validation |
| ETH Zurich | 4,096 parallel envs / GPU | Flat terrain walking in <4 minutes |
What to Randomize (and How Much)
Not all parameters are equal. The robotics RL literature and NVIDIA's Isaac Lab documentation converge on these categories:
Physics parameters (highest impact for locomotion):
- Ground friction coefficients (0.2 - 1.0)
- Link masses (80% - 120% of nominal)
- Joint damping and PD gains
- Motor strength scaling
- Center-of-mass offsets
Sensor noise (critical for deployment):
- IMU bias and drift
- Joint encoder noise
- Observation delay (1-3 timesteps)
- Action delay and latency
Environment variation (terrain generalization):
- Terrain roughness and slope
- External force perturbations
- Payload mass variation
Isaac Lab supports three randomization modes: direct (overwrite with random values), additive (add noise to defaults), and scaling (multiply defaults by random factors). Randomization triggers on every RL frame, at fixed intervals, or on environment reset.
The GPU Problem: One Machine Isn't Enough
Here's where theory meets infrastructure. NVIDIA's Isaac Lab can run 4,096 parallel environments on a single RTX 4090, achieving 85,000-95,000 steps per second. That's impressive for a single training configuration.
But domain randomization at the level these companies operate means running many different training configurations, not just many environments within one. You need to sweep across friction ranges, mass distributions, terrain types, and sensor noise profiles — each as a separate training run that produces a separate policy for evaluation.
A single GPU gives you parallel environments within one run. Distributed GPU infrastructure gives you parallel runs across your entire parameter space.
| Approach | What You Get | Limitation |
|---|---|---|
| Single GPU, 4,096 envs | Fast training for one config | Sequential across parameter sweep |
| Multi-GPU cluster | Parallel sweeps | Expensive idle time, complex setup |
| Cloud GPU orchestration | Parallel sweeps, pay-per-use | Need orchestration layer |
Running Parameter Sweeps with Canard
This is the problem we built Canard to solve. Instead of managing GPU clusters or SSHing into boxes, you define your sweep and let the platform handle distribution across cloud GPUs.
Each worker gets a unique slice of the parameter space — a specific combination of friction, mass, and other randomization parameters. Workers run Isaac Sim on cloud GPUs ($0.40/hr on RTX 4090, $0.80/hr on RTX 5090), execute their training config, and upload the resulting policy and metrics. No cluster to manage, no idle costs.
The advantage over a single-GPU workflow is straightforward: a 500-configuration sweep that would take weeks sequentially finishes in hours.
Further Reading
If you're going deeper on sim-to-real and domain randomization, these are the essential references:
- Tobin et al. (2017) — The original domain randomization paper. Start here.
- OpenAI — Learning Dexterous In-Hand Manipulation (2018) — Dexterous manipulation via domain randomization with a Shadow Hand.
- OpenAI — Solving Rubik's Cube (2019) — Introduced Automatic Domain Randomization (ADR).
- Peng et al. (2018) — Dynamics randomization for sim-to-real robotic control. Over 1,500 citations.
- Rudin et al. (2022) — Learning to walk in minutes using massively parallel RL. The modern recipe.
- NVIDIA — Spot Locomotion with Isaac Lab (2025) — Zero-shot Spot deployment, 4,096 envs on RTX 4090.
- Zhao et al. (2020) — Survey of sim-to-real transfer methods in deep RL.
Ready to run your first parameter sweep?
Go from a single GPU to distributed sweeps across cloud GPUs in minutes.
Get started with Canard