Pong_with_PPO_Agent

Deep RL Agent using Proximal Policy Optimization for solving the Pong game.

1. Summary

In this project, we will train an Deep RL Agent to play the Atari game pong using Proximal Policy Optimization (PPO) algorithm. The environment is provided by OpenAI. The Agent percieves the world through pixels, and a Convolutional Neural Network is used for training the Agent. The code in this repo should be self-contained, apart from a few dependencies that are installed dynamically using pip.

The architecture of the Conv net is based in 2 Conv layers and 2 fully connected layer with sigmoid ouput:

# set up a convolutional neural net
# the output is the probability of moving right
# P(left) = 1-P(right)
class Policy(nn.Module):

    def __init__(self):
        super(Policy, self).__init__()
        
        # 80x80 to outputsize x outputsize
        # outputsize = (inputsize - kernel_size + stride)/stride 
        # (round up if not an integer)

        n_filters_1 = 4
        n_filters_2 = 16
        
        # output = 4x20x20 here
        self.conv_1 = nn.Conv2d(in_channels=2, out_channels=n_filters_1, kernel_size=6, stride=2, bias=False)
        # size=n_filters_1*40*40
        
        
        self.conv_2 = nn.Conv2d(in_channels=n_filters_1, out_channels=n_filters_2, kernel_size=6, stride=4)
        # size=n_filters_2*20*20
        
        self.size=n_filters_2*9*9
        
        # fully connected layer / 6 actions
        self.fc1 = nn.Linear(self.size, 256)
        self.fc2 = nn.Linear(256, 1)
        self.sig = nn.Sigmoid()
        
    def forward(self, x):
   
        x = F.relu(self.conv_1(x))
        x = F.relu(self.conv_2(x))

        # flatten the tensor
        x = x.view(-1,self.size)
        x = F.relu(self.fc1(x))
        
        return self.sig(self.fc2(x))

Action Space

The Agent can choose from six different actions:

NOOP
FIRE
LEFT
RIGHT
LEFTFIRE
RIGHTFIRE

However, it is sufficient to train the Agent using only the actions LEFTFIRE and RIGHTFIRE.

Observation Space

The size of the observation space determined by two temporally adjacent, cropped, downscaaled 80x80 greyscale screenshots of the game screen.

Rewards

The reward is given by the game score.

2. Running the code

The code is this repository requires numpy, PyTorch, OpenAI and Jupyter. Make sure that those are installed, some of the dependencies are installed directly using pip. Then, just open the Notebook 'pong-PPO.ipynb'. Follow the instructions within the notebook to train the Agent.

3. Results

Without any training, the agent loses the game always.

With 1000 epochs of training, the agent wins the game!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
No_Train_PPO.gif		No_Train_PPO.gif
PPO.policy		PPO.policy
README.md		README.md
Trained_PPO.gif		Trained_PPO.gif
parallelEnv.py		parallelEnv.py
pong-PPO.ipynb		pong-PPO.ipynb
pong_utils.py		pong_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pong_with_PPO_Agent

1. Summary

Action Space

Observation Space

Rewards

2. Running the code

3. Results

About

Releases

Packages

Languages

GuillermoVR92/Deep-RL-Pong_with_PPO_Agent

Folders and files

Latest commit

History

Repository files navigation

Pong_with_PPO_Agent

1. Summary

Action Space

Observation Space

Rewards

2. Running the code

3. Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages