What is domain randomization in robotics?

Domain randomization is a technique where simulation parameters (friction, mass, lighting, textures) are varied across training runs so that RL policies learn to handle a wide range of conditions. As Tobin et al. (2017) put it: 'With enough variability in the simulator, the real world may appear to the model as just another variation.' This enables zero-shot transfer from simulation to real robots.

Why do robotics companies run millions of simulations?

Policies trained in a single simulation environment tend to exploit simulator-specific quirks that don't exist in the real world. Boston Dynamics runs approximately 150 million simulation runs per maneuver for Atlas. Figure AI trains across 200,000+ parallel environments. The scale ensures policies are robust enough for zero-shot real-world deployment.

How many parallel environments do you need for robotics RL training?

It depends on the task complexity. NVIDIA's Isaac Lab typically uses 4,096 parallel environments for quadruped locomotion on a single RTX 4090, achieving training times of ~4 hours. More complex tasks like dexterous manipulation may use 16,384+ environments. ETH Zurich demonstrated training a quadruped to walk on flat terrain in under 4 minutes using massively parallel simulation.

What is the sim-to-real gap?

The sim-to-real gap is the performance difference between a robot policy in simulation versus the real world. It arises from inaccurate physics models, simplified sensor models, and missing real-world noise. Domain randomization is the most widely adopted technique for closing this gap, used by Boston Dynamics, Figure AI, Agility Robotics, and others.

March 10, 2026

Why Parameter Sweeps Are the Secret to Sim-to-Real Robotics

Boston Dynamics runs approximately 150 million simulations per maneuver for Atlas. Figure AI trains across more than 200,000 parallel environments. Agility Robotics accumulates "decades of simulated time over three or four days." The common thread? Massive parameter sweeps. Here's why the best robotics teams in the world treat simulation scale as a first-class engineering problem.

The Reality Gap Problem

Every robotics engineer who has trained an RL policy in simulation and deployed it on hardware has felt this: the policy that looked perfect in sim stumbles, oscillates, or falls over in the real world.

This is the sim-to-real gap — the performance difference between simulation and reality. It arises from physics models that simplify contact dynamics, sensors that behave differently under real lighting, and actuators that don't respond exactly as modeled.

As Salvato et al. (2021) put it in their IEEE survey: "The most undesirable result occurs when the controller learnt in simulation fails the task on the real robot, thus resulting in an unsuccessful sim-to-real transfer."

A policy trained in one perfect simulation environment will learn to exploit that environment's specific quirks — quirks that don't exist on the real robot. The solution isn't a better simulator. It's a broader one.

Domain Randomization: The Industry Standard

In 2017, Tobin et al. at OpenAI introduced a deceptively simple idea: instead of trying to make simulation perfectly realistic, make it randomly unrealistic in many different ways. Vary friction, mass, lighting, textures, actuator delays — everything. Train across all of it.

The key insight: "With enough variability in the simulator, the real world may appear to the model as just another variation."

This technique — domain randomization — has become the dominant approach for sim-to-real transfer in robotics RL. It works because a policy trained across thousands of parameter variations can't rely on any single simulator quirk. It has to learn the underlying physics.

NVIDIA demonstrated this concretely: by applying structured domain randomization to door detection training in Isaac Sim, model accuracy on real-world images jumped from 5% to 87% — using only synthetic training data. (NVIDIA Developer Blog)

What the Best Teams Actually Do

Domain randomization sounds straightforward on paper. In practice, it requires running training at enormous scale. Here's what leading robotics companies are doing:

Boston Dynamics (Atlas)

Each Atlas maneuver is honed with data from approximately 150 million simulation runs. Multiple concurrent Atlas instances run in parallel, and policies transfer zero-shot to the 90kg physical robot. Their internal Spot robustness fleet operates 24/7, logging over 2,000 hours of testing per week.

Figure AI (Figure 02)

Figure's S0 whole-body control policy trains across more than 200,000 parallel environments with extensive domain randomization, transferring zero-shot to real hardware. A 10-million-parameter neural network replaced 100,000 lines of hand-written control code. Their key framing: "collecting years' worth of simulated demonstrations in a few hours."

Agility Robotics (Digit)

Agility trains Digit's whole-body control foundation model — under 1 million parameters — "for decades of simulated time over three or four days." They cross-validate policies in containerized MuJoCo to expose corner cases that a single simulator might miss.

ETH Zurich (ANYmal)

Rudin et al. (2022) showed that massively parallel simulation on a single GPU could train a quadruped to walk on flat terrain in under 4 minutes and rough terrain in 20 minutes. This paper established the recipe now used across the industry: PPO + thousands of parallel environments + domain randomization.

Company	Scale	Approach
Boston Dynamics	~150M sims per maneuver	Massive parallel sweep, zero-shot transfer
Figure AI	200K+ parallel environments	Domain randomization, replaced hand-coded control
Agility Robotics	Decades of sim-time in days	Foundation model + cross-simulator validation
ETH Zurich	4,096 parallel envs / GPU	Flat terrain walking in <4 minutes

What to Randomize (and How Much)

Not all parameters are equal. The robotics RL literature and NVIDIA's Isaac Lab documentation converge on these categories:

Physics parameters (highest impact for locomotion):

Ground friction coefficients (0.2 - 1.0)
Link masses (80% - 120% of nominal)
Joint damping and PD gains
Motor strength scaling
Center-of-mass offsets

Sensor noise (critical for deployment):

IMU bias and drift
Joint encoder noise
Observation delay (1-3 timesteps)
Action delay and latency

Environment variation (terrain generalization):

Terrain roughness and slope
External force perturbations
Payload mass variation

Isaac Lab supports three randomization modes: direct (overwrite with random values), additive (add noise to defaults), and scaling (multiply defaults by random factors). Randomization triggers on every RL frame, at fixed intervals, or on environment reset.

The GPU Problem: One Machine Isn't Enough

Here's where theory meets infrastructure. NVIDIA's Isaac Lab can run 4,096 parallel environments on a single RTX 4090, achieving 85,000-95,000 steps per second. That's impressive for a single training configuration.

But domain randomization at the level these companies operate means running many different training configurations, not just many environments within one. You need to sweep across friction ranges, mass distributions, terrain types, and sensor noise profiles — each as a separate training run that produces a separate policy for evaluation.

A single GPU gives you parallel environments within one run. Distributed GPU infrastructure gives you parallel runs across your entire parameter space.

Approach	What You Get	Limitation
Single GPU, 4,096 envs	Fast training for one config	Sequential across parameter sweep
Multi-GPU cluster	Parallel sweeps	Expensive idle time, complex setup
Cloud GPU orchestration	Parallel sweeps, pay-per-use	Need orchestration layer

Running Parameter Sweeps with Canard

This is the problem we built Canard to solve. Instead of managing GPU clusters or SSHing into boxes, you define your sweep and let the platform handle distribution across cloud GPUs.

# Install the SDK
pip install canard

# Define a domain randomization sweep
from canard import Client

client = Client()
run = client.submit_run(
    name="Go2-Robustness-Sweep",
    config={
        "task_name": "Template-Go2-Standing-Direct-v0",
        "num_envs": 4096,
        "max_iterations": 5000,
        "friction_range": [0.2, 1.0],
        "mass_scale_range": [0.8, 1.2],
        "num_samples": 500,
    }
)

# Workers on RunPod, Vast.ai, or AWS each take a
# slice of the parameter space automatically
run.wait_for_completion()
run.download_results("./results")

Each worker gets a unique slice of the parameter space — a specific combination of friction, mass, and other randomization parameters. Workers run Isaac Sim on cloud GPUs ($0.40/hr on RTX 4090, $0.80/hr on RTX 5090), execute their training config, and upload the resulting policy and metrics. No cluster to manage, no idle costs.

The advantage over a single-GPU workflow is straightforward: a 500-configuration sweep that would take weeks sequentially finishes in hours.