Domains: Robotics, Reinforcement Learning, Flow Policy Optimization, Sim2Real

</aside>

https://github.com/vruga/lerobot-sim2real

Overview

The project aims to develop robust robot control policies by applying advanced reinforcement learning techniques, with Proximal Policy Optimization (PPO) serving as a baseline and Flow Policy Optimization (FPO) investigated as a novel alternative.The core objective is to train these policies efficiently in a fast, GPU-parallelized simulation environment and then deploy them to physical robots with zero-shot generalization—meaning they can perform real-world tasks without any additional real-world data or fine-tuning.

Key Concepts

Reinforcement Learning (RL) A type of machine learning where an agent learns to make optimal decisions through trial and error to maximize a cumulative reward signal.

Flow Policy Optimization (FPO) An advanced algorithmic variant of RL designed to improve training by offering enhanced sample efficiency and greater stability compared to traditional methods like PPO.

Sim2Real (Simulation to Real World Transfer) The process of training a robot policy in a simulated environment and then transferring it directly to a physical, real-world robot.

Zero-shot Deployment The ability to deploy a policy trained purely in simulation to a real-world robot without needing any further fine-tuning or retraining on real-world data.

ManiSkill A fast, GPU-parallelized simulator used to generate large amounts of training data efficiently for robotic manipulation tasks.

Approach and Workflow

1. Environment Setup and Implementation The initial phase involved implementing the lerobot-sim2real repository and establishing a good simulation connected with our real environment where we wish to deploy the robot. This included resolving hardware compatibility challenges with the SO100 robotic arm, configuring camera systems for visual input processing, and figuring out the approach for the best environment overlay to ensure seamless integration between simulation and physical hardware.

2. Simulation only Training Policies are trained exclusively in simulation using reinforcement learning with no real world data collection. Proximal Policy Optimization (PPO) serves as the baseline, while Flow Policy Optimization (FPO) is explored as an advanced method to improve sample efficiency and training stability. The policies use visual inputs (RGB images) for the target task: picking up a cube.

3. Bridging Sim-to-Real To bridge the sim-real gap, techniques such as Domain Randomization are applied. These methods vary simulation parameters (e.g.,cube spawn size) so the trained policy generalizes better to the real robot.

4. Direct Hardware Transfer & Evaluation The trained policy is deployed directly to the SO100 robot arm. Its performance is measured by the success rate of cube-picking tasks and its robustness to environmental variations (e.g., lighting).