PPO-PyTorch

Minimal PyTorch implementation of Proximal Policy Optimization with clipped objective for OpenAI gym environments.

Usage

To test a preTrained network : run test.py or test_continuous.py
To train a new network : run PPO.py or PPO_continuous.py
All the hyperparameters are in the PPO.py or PPO_continuous.py file
If you are trying to train it on a environment where action dimension = 1, make sure to check the tensor dimensions in the update function of PPO class, since I have used torch.squeeze() quite a few times. torch.squeeze() squeezes the tensor such that there are no dimensions of length = 1 (more info).
Number of actors for collecting experience = 1. This could be easily changed by creating multiple instances of ActorCritic networks in the PPO class and using them to collect experience (like A3C and standard PPO).

Trained and tested on:

Python 3.6
PyTorch 1.0
NumPy 1.15.3
gym 0.10.8
Pillow 5.3.0

PPO Discrete LunarLander-v2 (1200 episodes)	PPO Continuous BipedalWalker-v2 (4000 episodes)

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
gif		gif
preTrained		preTrained
LICENSE		LICENSE
PPO.py		PPO.py
PPO_continuous.py		PPO_continuous.py
README.md		README.md
test.py		test.py
test_continuous.py		test_continuous.py