-
Notifications
You must be signed in to change notification settings - Fork 2
How to use GPU resources with nvidia docker2 and docker swarm
This is the native toolkit of nvidia for docker and once installed the user simply needs to add --gpus all
when launching a docker run
command.
docker run --rm -it --network host --gpus all tensorflow/tensorflow:latest-gpu nvidia-smi
pros This toolkit takes care, automatically, of the shared resources and drivers and thus one can use easily the GPU.
cons This is not supported in swarm mode, and the user cannot have access to the GPU resources.
It is possible to provide to docker swarm the required GPU resources by doing the following changes to:
sudo nano /etc/docker/daemon.json
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia",
"node-generic-resources": [
"NVIDIA-GPU=GPU-45cbf7b"
]
To get the ID of the GPU run nvidia-smi -a
Another file needs to be also changed as such:
sudo nano /etc/nvidia-container-runtime/config.toml
add or uncomment the following line;
swarm-resource = "DOCKER_RESOURCE_GPU"
Once this is done docker needs to be restarted by doing
sudo systemctl restart docker.service
Start the swarm
docker swarm init
and the service that needs GPU can be launched using:
docker service create --replicas 1 --name test-gpu --generic-resource "NVIDIA-GPU=0" tensorflow/tensorflow:latest-gpu sh -c "nvidia-smi"
pros This works nicely with the deployment of docker swarm services and GPU resources. These modifications to not create conflicts with nvidia-docker2
cons There are files to be modified before being able to run things.