Minimal PyTorch implementation of Proximal Policy Optimization with clipped objective for OpenAI gym environments.
- To test a preTrained network : run
test.py
ortest_continuous.py
- To train a new network : run
PPO.py
orPPO_continuous.py
- All the hyperparameters are in the
PPO.py
orPPO_continuous.py
file - If you are trying to train it on a environment where action dimension = 1, make sure to check the tensor dimensions in the update function of PPO class, since I have used
torch.squeeze()
quite a few times.torch.squeeze()
squeezes the tensor such that there are no dimensions of length = 1 (more info). - Number of actors for collecting experience = 1. This could be easily changed by creating multiple instances of ActorCritic networks in the PPO class and using them to collect experience (like A3C and standard PPO).
Trained and tested on:
Python 3.6
PyTorch 1.0
NumPy 1.15.3
gym 0.10.8
Pillow 5.3.0
PPO Discrete LunarLander-v2 (1200 episodes) | PPO Continuous BipedalWalker-v2 (4000 episodes) |
---|---|