March 10, 2026

Why Parameter Sweeps Are the Secret to Sim-to-Real Robotics

Boston Dynamics runs approximately 150 million simulations per maneuver for Atlas. Figure AI trains across more than 200,000 parallel environments. Agility Robotics accumulates "decades of simulated time over three or four days." The common thread? Massive parameter sweeps. Here's why the best robotics teams in the world treat simulation scale as a first-class engineering problem.

The Reality Gap Problem

Every robotics engineer who has trained an RL policy in simulation and deployed it on hardware has felt this: the policy that looked perfect in sim stumbles, oscillates, or falls over in the real world.

This is the sim-to-real gap — the performance difference between simulation and reality. It arises from physics models that simplify contact dynamics, sensors that behave differently under real lighting, and actuators that don't respond exactly as modeled.

As Salvato et al. (2021) put it in their IEEE survey: "The most undesirable result occurs when the controller learnt in simulation fails the task on the real robot, thus resulting in an unsuccessful sim-to-real transfer."

A policy trained in one perfect simulation environment will learn to exploit that environment's specific quirks — quirks that don't exist on the real robot. The solution isn't a better simulator. It's a broader one.

Domain Randomization: The Industry Standard

In 2017, Tobin et al. at OpenAI introduced a deceptively simple idea: instead of trying to make simulation perfectly realistic, make it randomly unrealistic in many different ways. Vary friction, mass, lighting, textures, actuator delays — everything. Train across all of it.

The key insight: "With enough variability in the simulator, the real world may appear to the model as just another variation."

This technique — domain randomization — has become the dominant approach for sim-to-real transfer in robotics RL. It works because a policy trained across thousands of parameter variations can't rely on any single simulator quirk. It has to learn the underlying physics.

NVIDIA demonstrated this concretely: by applying structured domain randomization to door detection training in Isaac Sim, model accuracy on real-world images jumped from 5% to 87% — using only synthetic training data. (NVIDIA Developer Blog)

What the Best Teams Actually Do

Domain randomization sounds straightforward on paper. In practice, it requires running training at enormous scale. Here's what leading robotics companies are doing:

Boston Dynamics (Atlas)

Each Atlas maneuver is honed with data from approximately 150 million simulation runs. Multiple concurrent Atlas instances run in parallel, and policies transfer zero-shot to the 90kg physical robot. Their internal Spot robustness fleet operates 24/7, logging over 2,000 hours of testing per week.

Figure AI (Figure 02)

Figure's S0 whole-body control policy trains across more than 200,000 parallel environments with extensive domain randomization, transferring zero-shot to real hardware. A 10-million-parameter neural network replaced 100,000 lines of hand-written control code. Their key framing: "collecting years' worth of simulated demonstrations in a few hours."

Agility Robotics (Digit)

Agility trains Digit's whole-body control foundation model — under 1 million parameters — "for decades of simulated time over three or four days." They cross-validate policies in containerized MuJoCo to expose corner cases that a single simulator might miss.

ETH Zurich (ANYmal)

Rudin et al. (2022) showed that massively parallel simulation on a single GPU could train a quadruped to walk on flat terrain in under 4 minutes and rough terrain in 20 minutes. This paper established the recipe now used across the industry: PPO + thousands of parallel environments + domain randomization.

CompanyScaleApproach
Boston Dynamics~150M sims per maneuverMassive parallel sweep, zero-shot transfer
Figure AI200K+ parallel environmentsDomain randomization, replaced hand-coded control
Agility RoboticsDecades of sim-time in daysFoundation model + cross-simulator validation
ETH Zurich4,096 parallel envs / GPUFlat terrain walking in <4 minutes

What to Randomize (and How Much)

Not all parameters are equal. The robotics RL literature and NVIDIA's Isaac Lab documentation converge on these categories:

Physics parameters (highest impact for locomotion):

Sensor noise (critical for deployment):

Environment variation (terrain generalization):

Isaac Lab supports three randomization modes: direct (overwrite with random values), additive (add noise to defaults), and scaling (multiply defaults by random factors). Randomization triggers on every RL frame, at fixed intervals, or on environment reset.

The GPU Problem: One Machine Isn't Enough

Here's where theory meets infrastructure. NVIDIA's Isaac Lab can run 4,096 parallel environments on a single RTX 4090, achieving 85,000-95,000 steps per second. That's impressive for a single training configuration.

But domain randomization at the level these companies operate means running many different training configurations, not just many environments within one. You need to sweep across friction ranges, mass distributions, terrain types, and sensor noise profiles — each as a separate training run that produces a separate policy for evaluation.

A single GPU gives you parallel environments within one run. Distributed GPU infrastructure gives you parallel runs across your entire parameter space.

ApproachWhat You GetLimitation
Single GPU, 4,096 envsFast training for one configSequential across parameter sweep
Multi-GPU clusterParallel sweepsExpensive idle time, complex setup
Cloud GPU orchestrationParallel sweeps, pay-per-useNeed orchestration layer

Running Parameter Sweeps with Canard

This is the problem we built Canard to solve. Instead of managing GPU clusters or SSHing into boxes, you define your sweep and let the platform handle distribution across cloud GPUs.

# Install the SDK pip install canard # Define a domain randomization sweep from canard import Client client = Client() run = client.submit_run( name="Go2-Robustness-Sweep", config={ "task_name": "Template-Go2-Standing-Direct-v0", "num_envs": 4096, "max_iterations": 5000, "friction_range": [0.2, 1.0], "mass_scale_range": [0.8, 1.2], "num_samples": 500, } ) # Workers on RunPod, Vast.ai, or AWS each take a # slice of the parameter space automatically run.wait_for_completion() run.download_results("./results")

Each worker gets a unique slice of the parameter space — a specific combination of friction, mass, and other randomization parameters. Workers run Isaac Sim on cloud GPUs ($0.40/hr on RTX 4090, $0.80/hr on RTX 5090), execute their training config, and upload the resulting policy and metrics. No cluster to manage, no idle costs.

The advantage over a single-GPU workflow is straightforward: a 500-configuration sweep that would take weeks sequentially finishes in hours.

Further Reading

If you're going deeper on sim-to-real and domain randomization, these are the essential references:

Ready to run your first parameter sweep?

Go from a single GPU to distributed sweeps across cloud GPUs in minutes.

Get started with Canard