Deep RL Agent using Proximal Policy Optimization for solving the Pong game.
In this project, we will train an Deep RL Agent to play the Atari game pong using Proximal Policy Optimization (PPO) algorithm. The environment is provided by OpenAI. The Agent percieves the world through pixels, and a Convolutional Neural Network is used for training the Agent. The code in this repo should be self-contained, apart from a few dependencies that are installed dynamically using pip.
The architecture of the Conv net is based in 2 Conv layers and 2 fully connected layer with sigmoid ouput:
# set up a convolutional neural net
# the output is the probability of moving right
# P(left) = 1-P(right)
class Policy(nn.Module):
def __init__(self):
super(Policy, self).__init__()
# 80x80 to outputsize x outputsize
# outputsize = (inputsize - kernel_size + stride)/stride
# (round up if not an integer)
n_filters_1 = 4
n_filters_2 = 16
# output = 4x20x20 here
self.conv_1 = nn.Conv2d(in_channels=2, out_channels=n_filters_1, kernel_size=6, stride=2, bias=False)
# size=n_filters_1*40*40
self.conv_2 = nn.Conv2d(in_channels=n_filters_1, out_channels=n_filters_2, kernel_size=6, stride=4)
# size=n_filters_2*20*20
self.size=n_filters_2*9*9
# fully connected layer / 6 actions
self.fc1 = nn.Linear(self.size, 256)
self.fc2 = nn.Linear(256, 1)
self.sig = nn.Sigmoid()
def forward(self, x):
x = F.relu(self.conv_1(x))
x = F.relu(self.conv_2(x))
# flatten the tensor
x = x.view(-1,self.size)
x = F.relu(self.fc1(x))
return self.sig(self.fc2(x))
The Agent can choose from six different actions:
- NOOP
- FIRE
- LEFT
- RIGHT
- LEFTFIRE
- RIGHTFIRE
However, it is sufficient to train the Agent using only the actions LEFTFIRE and RIGHTFIRE.
The size of the observation space determined by two temporally adjacent, cropped, downscaaled 80x80 greyscale screenshots of the game screen.
The reward is given by the game score.
The code is this repository requires numpy, PyTorch, OpenAI and Jupyter. Make sure that those are installed, some of the dependencies are installed directly using pip. Then, just open the Notebook 'pong-PPO.ipynb'. Follow the instructions within the notebook to train the Agent.
Without any training, the agent loses the game always.
With 1000 epochs of training, the agent wins the game!