This example implements a task that sleeps for {runtime}
seconds and holds a Numpy array of {memory}
megabytes in size. The runtime can be measured with timeit
and the memory usage with memray
. It runs using Dask, a distributed computing framework, and dask_jobqueue, a library to launch jobs on HPC clusters.
The main function is Python process that executes the Dask scheduler, managing the owrkers. The workers are launched using the dask_jobqueue
library, which submits jobs to the HPC cluster. It is important that the scheduler does not run on the login node and uses the same HPC modules environment as the workers. You can the HPC scheduler (PBS or SLURM) to submit this Python script from the login node or launch it from an interactive session. Here a different approach is used. A conda environment similar to the modules environment is used, which can create a similar environment on a local laptop and on the HPC login node. This environment can be used in VS Code for debugging and on the HPC cluster for launching the scheduler and workers. Actually running the tasks is done by the schedulers and workers, using the modules environment.
(1) Locally on a laptop or (2) the HPC login node
conda env update -f src/dask_jobqueue/env.yaml
conda activate dask_jobqueue
pip install -r src/dask_jobqueue/requirements.txt
(3) installing the modules environment for a specific HPC cluster
ml swap cluster/donphan
INSTALL=src/dask_jobqueue srun --pty runner.pbs
To test out the worker environment, start an interactive session on the HPC cluster
qsub -I # not needed if using session via the Open OnDemand portal
ACTIVATE=src/dask_jobqueue source runner.pbs
cd $PBS_O_WORKDIR
All three of these seperate cases should be able to run:
python src/dask_jobqueue/main.py --help
Without extra options, the following command will run locally and execute one task with a runtime of 1 second and a memory usage of 10 megabytes.
python src/dask_jobqueue/main.py
OUTPUT:
(dask_jobqueue) [vsc43257@gligar07 hydra_hpc_example]$ python src/dask_jobqueue/main.py
Output directory: /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/outputs/2024-01-04/13-59-46
Path venv: /data/gent/432/vsc43257/venvs//venv_dask_jobqueue_doduo
Path modules: /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/src/dask_jobqueue
[2024-01-04 13:59:49,837][__main__][INFO] - Process ID 248241 executed task {'runtime': 1, 'memory': 10} in 1.0149385700933635 seconds
In the folder outputs/
you can find the output of the run by date and time. The output folder contains a .hydra/
folder specifying the complete configuration used for the run and any output file written to cfg.output_dir
.
To run on the HPC cluster, we need to launch job workers not locally, but on a cluster worker node using dask_jobqueue
. For this we can use the cluster=slurm
argument, which will use the Dask cluster defined at configs/cluster/slurm.yaml
. This will not work if there is an environment mismatch between the scheduler (conda environment) and the workers (modules environment).
python src/dask_jobqueue/main.py cluster=slurm
It is possible to also run on the very limited login node by using the conda environment also at the workers with cluster=slurm_conda, but it is highly recommended to submit jobs the debug cluster instead.
ml swap cluster/donphan
python src/dask_jobqueue/main.py cluster=slurm launcher=submitit_slurm -m
After testing and debugging, large jobs can be launched using two methods, the runner script or submitit.
The runner script does not need a conda environment to launch and is most similar to PBS or SLURM scripts.
ACTIVATE=src/sleep_pbs SCRIPT=src/dask_jobqueue/main.py sbatch runner.pbs --help
The Hydra Launcher plugin approach requires a conda environment to launch, but is more integrated with the Hydra CLI and other plugins. Note the need for the multirun flag -m
. The options for hydra/launcher
are built-in by installing the Launcher plugins, but launcher=submitit_slurm
is a config located at configs/launcher/submitit_slurm
. The output of multirun jobs are in the folder multiruns
, not outputs
.
# test launch on login node via joblib and conda env
python src/dask_jobqueue/main.py cluster=slurm_conda hydra/launcher=joblib -m
# test launch on login node via submitit_local and conda env
python src/dask_jobqueue/main.py cluster=slurm_conda hydra/launcher=submitit_local -m
# launch on HPC cluster via submitit_slurm and modules env
python src/dask_jobqueue/main.py cluster=slurm launcher=submitit_slurm -m
TODO currently submitit_launcher does not work, as submitit_slurm uses the current executor sys.executable, which is the conda env python, not the modules env python. Fix PR at facebookresearch/hydra#2830.
Because the multirun output folder has a fixed structure, you can parse the folder with a script and plot the benchmarking metrics. This example script creates plots in the multirun folder when executing the script with a benchmark option.
ml swap cluster/donphan
# launch interactive session
# ACTIVATE=src/dask_jobqueue SCRIPT=bash srun --pty runner.pbs
# execute the multi-run
python src/dask_jobqueue/main.py +sweep='{runtime: 1, memory: 1},{runtime: 2, memory: 10},{runtime: 3, memory: 100}' task.runtime='${sweep.runtime}' task.memory='${sweep.memory}' benchmark=all hydra/launcher=joblib -m
python scripts/sleep_plots.py
Note that the configuration task.sleep=1,2,3 benchmark=runtime,memory
would require 6 tasks and fail on an interactive cluster with a 5 task queue limit.
The runtime plot shows the 3 tasks with increasing sleep length.
The memory plot shows the 3 tasks with increasing memory usage.
Note that by default Hydra stores the configuration, logs and output of each run in a unique folder outputs/{DAY}/{TIME}/
. The output folder contains a .hydra/
folder specifying the complete configuration used for the run and any output file written to cfg.output_dir
.
python src/sleep_hydra/main.py
OUTPUT:
Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/outputs/2023-10-20/11-09-33
[2023-10-20 11:09:34,986][__main__][INFO] - Process ID 1595501 executed task {'runtime': 1, 'memory': 10} in 1.000080016994616 seconds
Note that the multiple runs now each have an output folder in multirun/{DAY}/{TIME}/
. All process IDs are the same, as the tasks run sequentially in the same process.
python src/sleep_hydra/main.py task.runtime=1,2,3 -m
OUTPUT:
[2023-10-20 11:10:43,219][HYDRA] Launching 3 jobs locally
[2023-10-20 11:10:43,219][HYDRA] #0 : task.runtime=1
Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/multirun/2023-10-20/11-10-42/0
[2023-10-20 11:10:44,407][__main__][INFO] - Process ID 1597253 executed task {'runtime': 1, 'memory': 10} in 1.0000840639986563 seconds
[2023-10-20 11:10:44,411][HYDRA] #1 : task.runtime=2
Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/multirun/2023-10-20/11-10-42/1
[2023-10-20 11:10:46,501][__main__][INFO] - Process ID 1597253 executed task {'runtime': 2, 'memory': 10} in 2.0000894780096132 seconds
[2023-10-20 11:10:46,503][HYDRA] #2 : task.runtime=3
Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/multirun/2023-10-20/11-10-42/2
[2023-10-20 11:10:49,586][__main__][INFO] - Process ID 1597253 executed task {'runtime': 3, 'memory': 10} in 3.0000770180195104 seconds
Note that all process IDs are the different, as each tasks runs in it's own process.
python src/sleep_hydra/main.py task.runtime=1,2,3 hydra/launcher=joblib -m
OUTPUT:
[2023-10-20 11:12:03,084][HYDRA] Joblib.Parallel(n_jobs=-1,backend=loky,prefer=processes,require=None,verbose=0,timeout=None,pre_dispatch=2*n_jobs,batch_size=auto,temp_folder=None,max_nbytes=None,mmap_mode=r) is launching 3 jobs
[2023-10-20 11:12:03,084][HYDRA] Launching jobs, sweep output dir : multirun/2023-10-20/11-11-57
[2023-10-20 11:12:03,085][HYDRA] #0 : task.runtime=1
[2023-10-20 11:12:03,085][HYDRA] #1 : task.runtime=2
[2023-10-20 11:12:03,085][HYDRA] #2 : task.runtime=3
Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/multirun/2023-10-20/11-11-57/0
Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/multirun/2023-10-20/11-11-57/1
Output directory : /kyukon/data/gent/vo/000/gvo00070/vsc43257/hydra_hpc_example/multirun/2023-10-20/11-11-57/2
[2023-10-20 11:12:04,881][__main__][INFO] - Process ID 1598945 executed task {'runtime': 1, 'memory': 10} in 1.0003418159903958 seconds
[2023-10-20 11:12:05,912][__main__][INFO] - Process ID 1598949 executed task {'runtime': 2, 'memory': 10} in 2.000080882018665 seconds
[2023-10-20 11:12:06,913][__main__][INFO] - Process ID 1598952 executed task {'runtime': 3, 'memory': 10} in 3.000079121993622 seconds
Note that by default no logging and print statements are shown, these are stored at .submitit/
in the output directory next to the output of each run. submitit_local
uses subprocess
to run the tasks locally, so the functionality and output is similar to joblib.
python src/sleep_hydra/main.py task.runtime=1,2,3 hydra/launcher=submitit_local -m
OUTPUT:
[2023-10-20 11:12:57,639][HYDRA] Submitit 'local' sweep output dir : multirun/2023-10-20/11-12-57
[2023-10-20 11:12:57,641][HYDRA] #0 : task.runtime=1
[2023-10-20 11:12:57,645][HYDRA] #1 : task.runtime=2
[2023-10-20 11:12:57,653][HYDRA] #2 : task.runtime=3
Can only be executed on the HPC cluster.
python src/sleep_hydra/main.py task.runtime=1,2,3 hydra/launcher=submitit_slurm -m
OUTPUT:
[2023-10-20 11:15:04,918][HYDRA] Submitit 'slurm' sweep output dir : multirun/2023-10-20/11-15-04
[2023-10-20 11:15:04,920][HYDRA] #0 : task.runtime=1
[2023-10-20 11:15:04,924][HYDRA] #1 : task.runtime=2
[2023-10-20 11:15:04,928][HYDRA] #2 : task.runtime=3
For a full list of settings, see the HPC documentation. To see available parameters, run:
python src/sleep_hydra/main.py hydra/launcher=submitit_slurm --cfg hydra -p hydra.launcher
Slurm job with 2 CPUs and 4GB of RAM:
python src/sleep_hydra/main.py hydra/launcher=submitit_slurm hydra.launcher.cpus_per_task=2 hydra.launcher.mem_gb=4GB
We can benchmark both runtime and memory in two ways besides benchmark=runtime
and benchmark=memory
:
benchmark=all
will run the task twice, once with timeit
and once with memray
.
# seconds
[(3, 3.000507133983774), (2, 2.0004843850038014), (1, 1.0004661950224545)]
# megabytes
[(100, 95.39), (10, 9.559), (1, 0.999444)]
benchmark=hybrid
will run the task once withtimeit
inside thememray
wrapper.
# seconds
[(3, 3.013979399984237), (2, 2.013261186017189), (1, 1.015038490993902)]
# megabytes
[(100, 96.392), (10, 10.561), (1, 1.001554)]
The overhead of benchmark=hybrid
is negligable for large sizes, it's more important to run multiple times and report the variance.