<aside> <img src="/icons/burst_gray.svg" alt="/icons/burst_gray.svg" width="40px" />

Domains: Reinforcement Learning, Simulation in Robotics, Reward Engineering

</aside>

https://github.com/Advait2211/quad_move_eklavya

Overview

This project focuses on simulating and training Unitree’s Go2 to walk in MuJoCo using PPO and reward engineering.

Key Concepts

Reinforcement Learning Type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize cumulative reward.

Simulation Creating a virtual environment to model and test a system or robot before deploying it in the real world.

PPO (Proximal Policy Optimization) A reinforcement learning algorithm that optimizes policies while ensuring stable and safe updates during training.

Reward Engineering and Optimization Designing reward functions to encourage forward motion, stability, and energy efficiency

Approach and Workflow

  1. Simulation Setup Simulated the Unitree Go2 quadruped in MuJoCo, creating a physics-accurate environment for training and testing.
  2. Algorithm and Reward Function Trained walking gait policy using Proximal Policy Optimization (PPO). Reward encouraged forward velocity, energy efficiency, and stability.
  3. Tuning Hyperparameters Tweaked learning rate, batch size, and clipping ratio until the policy started converging and the robot learned to walk.
  4. Testing and Evaluation Ran trials with friction enabled vs. friction removed in the scene to see how the robot’s gait adapted and where it failed.