NVlabs · imgeorgiev · Apr 10, 2023 · Apr 11, 2023 · Apr 12, 2023 · Apr 12, 2023
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,27 @@
+**__pycache__/
+**.ipynb_checkpoints/
+*outputs/
+*.swp
+*.swo
+tags
+**.out
+**.log
+**.pdf
+
+dflex/dflex/kernels*/
+**logs/
+jobs/
+
+**data/
+**.egg-info/
+
+wandb/
+checkpoints/
+multirun/
+
+scripts/sweeps/
+scripts/outputs
+scripts/runs/
+good-shit
+
+**.DS_Store
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "externals/svg"]
+	path = externals/svg
+	url = [email protected]:imgeorgiev/svg.git
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -0,0 +1,6 @@
+{
+    "[python]": {
+        "editor.defaultFormatter": "ms-python.black-formatter"
+    },
+    "python.formatting.provider": "none",
+}
diff --git a/README.md b/README.md
@@ -1,110 +1,66 @@
-# SHAC
+# Adaptive Horizon Actor Critic (AHAC)
 
-This repository contains the implementation for the paper [Accelerated Policy Learning with Parallel Differentiable Simulation](https://short-horizon-actor-critic.github.io/) (ICLR 2022).
+This repository contains the implementation for the paper [Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation](https://adaptive-horizon-actor-critic.github.io/) (ICML 2024).
 
+In this paper, we build on previous work in differentiable simulation policy optimization, to create Adaptive Horizon Actor Critic (AHAC). Our approach deals with gradient error arising from stiff contact by dynamically adapting its model-based horizon to fit one robot gait and avoid excessive contact. This results in a higher performant and easier to use algorithm than its predecessor [Short Horizon Actor Critic (SHAC)](https://short-horizon-actor-critic.github.io/) while also outperofming PPO by 40% across a set of high-dimensional locomotion tasks.
 
-
-In this paper, we present a GPU-based differentiable simulation and propose a policy learning method named SHAC leveraging the developed differentiable simulation. We provide a comprehensive benchmark set for policy learning with differentiable simulation. The benchmark set contains six robotic control problems for now as shown in the figure below. 
-
-<p align="center">
-    <img src="figures/envs.png" alt="envs" width="800" />
-</p>
+[![Watch the video](figures/envs.png)](https://adaptive-horizon-actor-critic.github.io/media/all_envs_trimmed.mp4)
 
 ## Installation
 
-- `git clone https://github.com/NVlabs/DiffRL.git --recursive`
-
-- The code has been tested on 
-  - Operating System: Ubuntu 16.04, 18.04, 20.04, 21.10, 22.04
-  - Python Version: 3.7, 3.8
-  - GPU: TITAN X, RTX 1080, RTX 2080, RTX 3080, RTX 3090, RTX 3090 Ti
-
-#### Prerequisites
-
-- In the project folder, create a virtual environment in Anaconda:
-
-  ```
-  conda env create -f diffrl_conda.yml
-  conda activate shac
-  ```
-
-- dflex
+ `git clone https://github.com/imgeorgiev/DiffRL --recursive`
 
-  ```
-  cd dflex
-  pip install -e .
-  ```
 
-- rl_games, forked from [rl-games](https://github.com/Denys88/rl_games) (used for PPO and SAC training):
-
-  ````
-  cd externals/rl_games
-  pip install -e .
-  ````
-
-- Install an older version of protobuf required for TensorboardX:
-  ````
-  pip install protobuf==3.20.0
-  ````
-
-#### Test Examples
-
-A test example can be found in the `examples` folder.
+Setup this project with Anaconda
+```
+conda env create -f environment.yml
+conda activate diffrl
+pip install -e dflex
+pip install -e .
+```
 
+For an unknown reason, you need to symlink cuda libraries for ninja to work:
 ```
-python test_env.py --env AntEnv
+ln -s $CONDA_PREFIX/lib $CONDA_PREFIX/lib64
 ```
 
-If the console outputs `Finish Successfully` in the last line, the code installation succeeds.
+If you want SVG as a baseline:
 
+```
+pip install -e externals/svg
+```
 
 ## Training
 
-Running the following commands in `examples` folder allows to train Ant with SHAC.
 ```
-python train_shac.py --cfg ./cfg/shac/ant.yaml --logdir ./logs/Ant/shac
+python train.py alg=ahac env=ant
 ```
 
-We also provide a one-line script in the `examples/train_script.sh` folder to replicate the results reported in the paper for both our method and for baseline method. The results might slightly differ from the paper due to the randomness of the cuda and different Operating System/GPU/Python versions. The plot reported in paper is produced with TITAN X on Ubuntu 16.04.
-
-#### SHAC (Our Method)
+where you can change `alg` and `env` freely based in the provided hydra configurations.
 
-For example, running the following commands in `examples` folder allows to train Ant and SNU Humanoid (Humanoid MTU in the paper) environments with SHAC respectively for 5 individual seeds.
+The training script outputs tensorboard logs by default. If you want to use wandb, you can add the additional flag `general.run_wandb=True` and specify `wandb.project=<name>` `wnadb.entity=<entity>`.
 
-```
-python train_script.py --env Ant --algo shac --num-seeds 5
-```
+Note that dflex is not fully deterministic due to GPU acceleration and cannot reproduce the same results given then same seed.
 
-```
-python train_script.py --env SNUHumanoid --algo shac --num-seeds 5
-```
 
-#### Baseline Algorithms
+## Testing
 
-For example, running the following commands in `examples` folder allows to train Ant environment with PPO implemented in RL_games for 5 individual seeds,
+You can load a policy and evluate it without training. Works only for AHAC and SHAC algorithms.
 
 ```
-python train_script.py --env Ant --algo ppo --num-seeds 5
+python train.py alg=ahac env=ant train=False checkpoint=<policy_path>
 ```
 
-## Testing
+You can also control the number of eval episodes with `env.player.games_num=10`.
 
-To test the trained policy, you can input the policy checkpoint into the training script and use a `--play` flag to indicate it is for testing. For example, the following command allows to test a trained policy (assume the policy is located in `logs/Ant/shac/policy.pt`)
+## Generating rendering files
 
-```
-python train_shac.py --cfg ./cfg/shac/ant.yaml --checkpoint ./logs/Ant/shac/policy.pt --play [--render]
-```
+The `general.render` flag indicates whether to export the video of the task execution. If does, the exported video is encoded in `.usd` format, and stored in the `examples/output` folder. To visualize the exported `.usd` file, refer to [USD at NVIDIA](https://developer.nvidia.com/usd).
 
-The `--render` flag indicates whether to export the video of the task execution. If does, the exported video is encoded in `.usd` format, and stored in the `examples/output` folder. To visualize the exported `.usd` file, refer to [USD at NVIDIA](https://developer.nvidia.com/usd).
+```python
+python train.py alg=ahac env=ant general.train=False general.render=True general.checkpoint=<policy_path> env.config.stochastic_init=False env.player.games_num=1 env.player.num_actors=1 env.config.num_envs=1 alg.eval_runs=1
+```
 
-## Citation
+Once you have generated a rendering file you can load it in USD Composer to generate a image or video render like the one above. To install Omniverse, follow the [Omniverse Install Page](https://www.nvidia.com/en-us/omniverse/download/). Then install [USD Composer](https://www.nvidia.com/en-us/omniverse/apps/create/) from the Omniverse GUI. Start USD Composer and load the usd files generated by the script above.
 
-If you find our paper or code is useful, please consider citing:
-```kvk
-  @inproceedings{xu2021accelerated,
-    title={Accelerated Policy Learning with Parallel Differentiable Simulation},
-    author={Xu, Jie and Makoviychuk, Viktor and Narang, Yashraj and Ramos, Fabio and Matusik, Wojciech and Garg, Animesh and Macklin, Miles},
-    booktitle={International Conference on Learning Representations},
-    year={2021}
-  }
-```
+