Deep deterministic Policy Gradient on HalfCheetah-v2 Dependencies tensorflow gym mujocopy Run! Simply type on the terminal python main.py --mode train/test. Results After ~ 18000 episodes the mean reward converges to 2700.