Double deep QLearning and A3C algorithms on InvertedDoublePendulum-v2 from OpenAI Gym.
Mujoco (1.50 for Windows) from
PyTorch > 1.0
imageio-ffmpeg for recording videos of simulation
Run it from src
for training with default parameters. Default algorithm is DDQN.
--algorithm {A3C,DDQN}
Algorithm to use.
--load_file LOAD_FILE
Custom filename from which to load models before rendering.
By default, trained models are saved to file <algorithm>--<episodes>-<threads>-<discount>-<step_max>-<actor_lr>-<critic_lr>
For example: A3C--1000000-5-0_99-5-0_001-0_001
--threads THREADS
Number of threads for A3C.
--episodes EPISODES
Number of episodes for training process.
--discount DISCOUNT
Discount rate.
--step_max STEP_MAX
Max actor's steps before update of global model in A3C.
--actor_lr ACTOR_LR
Actor's learning rate.
--critic_lr CRITIC_LR
Critic's learning rate.
--eval_repeats EVAL_REPEATS
Number of evaluation runs in one performance evaluation. Set to 0 to disable evaluation during training.
Disable logging during training.
Render environment. Before rendering, there must exist a model
saved in a file which name is generated based on parameters or explicitly provided.
Learning rate.
We wait "min_episodes" many episodes in order to aggregate enough data before starting to train.
Probability to take a random action during training.
After every episode "eps" is multiplied by "eps_decay" to reduces exploration over time.
Minimal value of "eps".
After "update_step" many episodes the Q-Network is trained "update_repeats" many times with a batch of size "batch_size" from the memory.
See above.
See above.
Random seed for reproducibility.
Size of the replay memory.
Every "measure_step" episode the performance is measured.
The amount of episodes played in to asses performance.
Hidden dimensions for the Q_network.
Number of steps taken in the environment before terminating the episode (prevents very long episodes).
See above.
Number of action space to discretize to