Skip to content

How to use GPU resources with nvidia docker2 and docker swarm

Valentina Gaggero edited this page May 11, 2023 · 1 revision

Using nvidia-docker2

This is the native toolkit of nvidia for docker and once installed the user simply needs to add --gpus all when launching a docker run command.

docker run --rm -it --network host --gpus all tensorflow/tensorflow:latest-gpu nvidia-smi

pros This toolkit takes care, automatically, of the shared resources and drivers and thus one can use easily the GPU.

cons This is not supported in swarm mode, and the user cannot have access to the GPU resources.

Using docker services (swarm)

It is possible to provide to docker swarm the required GPU resources by doing the following changes to:

sudo nano /etc/docker/daemon.json
"runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  },
  "default-runtime": "nvidia",
  "node-generic-resources": [
    "NVIDIA-GPU=GPU-45cbf7b"
    ]

To get the ID of the GPU run nvidia-smi -a

Another file needs to be also changed as such:

sudo nano /etc/nvidia-container-runtime/config.toml

add or uncomment the following line;

swarm-resource = "DOCKER_RESOURCE_GPU"

Once this is done docker needs to be restarted by doing

sudo systemctl restart docker.service

Start the swarm

docker swarm init

and the service that needs GPU can be launched using:

docker service create --replicas 1 --name test-gpu --generic-resource "NVIDIA-GPU=0" tensorflow/tensorflow:latest-gpu sh -c "nvidia-smi"

pros This works nicely with the deployment of docker swarm services and GPU resources. These modifications to not create conflicts with nvidia-docker2

cons There are files to be modified before being able to run things.