I'm trying to use GPU from inside my docker container. I'm using docker with version 19.03 on Ubuntu 18.04.
Outside the docker container if I run nvidia-smi I get the below output.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 30C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
If I run the samething inside the container created from nvidia/cuda docker image, I get the same output as above and everything is running smoothly. torch.cuda.is_available() returns True.
But If I run the same nvidia-smi command inside any other docker container, it gives the following output where you can see that the CUDA Version is coming as N/A. Inside the containers torch.cuda.is_available() also returns False.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 30C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I have installed nvidia-container-toolkit using the following commands.
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu18.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install nvidia-container-toolkit
sudo systemctl restart docker
I started my containers using the following commands
sudo docker run --rm --gpus all nvidia/cuda nvidia-smi
sudo docker run -it --rm --gpus all ubuntu nvidia-smi
However, Docker® containers are most commonly used to easily deploy CPU-based applications on several machines, where containers are both hardware- and platform-agnostic. The Docker engine doesn't natively support NVIDIA GPUs as it uses specialized hardware that requires the NVIDIA driver to be installed.
The NVIDIA Container Toolkit for Docker is required to run CUDA images. For CUDA 10.0, nvidia-docker2 (v2. 1.0) or greater is recommended. It is also recommended to use Docker 19.03.
Time-sharing allows a maximum of 48 containers to share a physical GPU whereas multi-instance GPUs on A100 allows up to a maximum of 7 partitions. If you want to maximize your GPU utilization, you can configure time-sharing for each multi-instance GPU partition.
For anybody arriving here looking how to do it with docker compose, add to your service:
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities:
- gpu
- utility # nvidia-smi
- compute # CUDA. Required to avoid "CUDA version: N/A"
- video # NVDEC/NVENC. For instance to use a hardware accelerated ffmpeg. Skip it if you don't need it
Doc: https://docs.docker.com/compose/gpu-support
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With