πŸš€ PPO Lunar Lander Agent

Author: Ginni Garg

This repository contains a Reinforcement Learning (RL) model trained to safely land a spacecraft in the LunarLander-v2 environment.

Even if you are completely new to Reinforcement Learning, this README will help you understand:

  • What this project does
  • What PPO is
  • What LunarLander is
  • How the model was trained
  • How to use the model
  • How to reproduce results

πŸŒ• What is This Project?

This project trains an AI agent to land a spacecraft safely on the moon.

The spacecraft must:

  • Control its engines
  • Avoid crashing
  • Land between two flags
  • Use fuel efficiently

The agent learns by trial and error β€” just like a human learning a video game.


🧠 What is Reinforcement Learning?

Reinforcement Learning (RL) is a type of Machine Learning where:

  1. An agent interacts with an environment
  2. It takes actions
  3. It receives rewards or penalties
  4. It learns to maximize total reward

Think of it like training a dog:

  • Good behavior β†’ treat (reward)
  • Bad behavior β†’ no treat (penalty)

Over time, the agent learns the best strategy.


πŸš€ What is LunarLander-v2?

LunarLander is a simulation environment from Gymnasium.

The goal:

  • Land a spacecraft safely between two flags.

The agent receives:

  • Positive reward for landing successfully
  • Negative reward for crashing
  • Small penalties for wasting fuel

πŸ” Environment Details

Observation Space (What the Agent Sees)

The agent receives 8 values:

Index Meaning
0 Horizontal position
1 Vertical position
2 Horizontal velocity
3 Vertical velocity
4 Angle
5 Angular velocity
6 Left leg touching ground (0 or 1)
7 Right leg touching ground (0 or 1)

These numbers describe the spacecraft’s current state.


Action Space (What the Agent Can Do)

There are 4 possible actions:

Action Meaning
0 Do nothing
1 Fire left engine
2 Fire main engine
3 Fire right engine

The agent must choose the correct action at each time step.


πŸ€– What Algorithm Was Used?

Proximal Policy Optimization (PPO)

PPO is a popular and stable Reinforcement Learning algorithm.

Why PPO?

  • Stable training
  • Good performance
  • Widely used in industry
  • Balances exploration and exploitation

It updates the policy in small safe steps to avoid instability.


βš™οΈ Training Details

Model Architecture

  • Policy: MLP (Multi-Layer Perceptron)
  • Framework: Stable-Baselines3
  • Algorithm: PPO

Hyperparameters Used

  • mean_reward=212.56 +/- 94.25619506404215
PPO(
    policy="MlpPolicy",
    n_steps=1024,
    batch_size=64,
    n_epochs=4,
    gamma=0.999,
    gae_lambda=0.98,
    ent_coef=0.01,
    verbose=1
)
Downloads last month
10
Video Preview
loading