Skip to content

QLearning and A3C algorithms on InvertedDoublePendulum-v2 from OpenAI Gym.

Notifications You must be signed in to change notification settings

MBelniak/USD-Inverted-Double-Pendulum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 

Repository files navigation

USD-Inverted-Double-Pendulum

Double deep QLearning and A3C algorithms on InvertedDoublePendulum-v2 from OpenAI Gym.

Model preformance with A3C

Requirements:

Mujoco (1.50 for Windows) from https://www.roboti.us/index.html

PyTorch > 1.0

imageio-ffmpeg for recording videos of simulation

Usage:

Run it from src directory:

python main.py

for training with default parameters. Default algorithm is DDQN.

Available parameters:

--algorithm {A3C,DDQN} Algorithm to use.

--load_file LOAD_FILE Custom filename from which to load models before rendering.
By default, trained models are saved to file <algorithm>--<episodes>-<threads>-<discount>-<step_max>-<actor_lr>-<critic_lr>
For example: A3C--1000000-5-0_99-5-0_001-0_001

--threads THREADS Number of threads for A3C.

--episodes EPISODES Number of episodes for training process.

--discount DISCOUNT Discount rate.

--step_max STEP_MAX Max actor's steps before update of global model in A3C.

--actor_lr ACTOR_LR Actor's learning rate.

--critic_lr CRITIC_LR Critic's learning rate.

--eval_repeats EVAL_REPEATS Number of evaluation runs in one performance evaluation. Set to 0 to disable evaluation during training.

-no_log Disable logging during training.

-render Render environment. Before rendering, there must exist a model saved in a file which name is generated based on parameters or explicitly provided.

--lr Learning rate.

--min_episodes We wait "min_episodes" many episodes in order to aggregate enough data before starting to train.

--eps Probability to take a random action during training.

--eps_decay After every episode "eps" is multiplied by "eps_decay" to reduces exploration over time.

--eps_min Minimal value of "eps".

--update_step After "update_step" many episodes the Q-Network is trained "update_repeats" many times with a batch of size "batch_size" from the memory.

--batch_size See above.

--update_repeats See above.

--seed Random seed for reproducibility.

--max_memory_size Size of the replay memory.

--measure_step Every "measure_step" episode the performance is measured.

--measure_repeats The amount of episodes played in to asses performance.

--hidden_dim Hidden dimensions for the Q_network.

--horizon Number of steps taken in the environment before terminating the episode (prevents very long episodes).

--render_step See above.

--num_actions Number of action space to discretize to

About

QLearning and A3C algorithms on InvertedDoublePendulum-v2 from OpenAI Gym.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages