Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using GPU inside docker container - CUDA Version: N/A and torch.cuda.is_available returns False

I'm trying to use GPU from inside my docker container. I'm using docker with version 19.03 on Ubuntu 18.04.

Outside the docker container if I run nvidia-smi I get the below output.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05    Driver Version: 450.51.05    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   30C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

If I run the samething inside the container created from nvidia/cuda docker image, I get the same output as above and everything is running smoothly. torch.cuda.is_available() returns True.

But If I run the same nvidia-smi command inside any other docker container, it gives the following output where you can see that the CUDA Version is coming as N/A. Inside the containers torch.cuda.is_available() also returns False.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05    Driver Version: 450.51.05    CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   30C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I have installed nvidia-container-toolkit using the following commands.

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu18.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install nvidia-container-toolkit
sudo systemctl restart docker

I started my containers using the following commands

sudo docker run --rm --gpus all nvidia/cuda nvidia-smi
sudo docker run -it --rm --gpus all ubuntu nvidia-smi
like image 421
Sai Chander Avatar asked Sep 05 '20 08:09

Sai Chander


People also ask

Can Docker container use GPU?

However, Docker® containers are most commonly used to easily deploy CPU-based applications on several machines, where containers are both hardware- and platform-agnostic. The Docker engine doesn't natively support NVIDIA GPUs as it uses specialized hardware that requires the NVIDIA driver to be installed.

Can you use CUDA in Docker?

The NVIDIA Container Toolkit for Docker is required to run CUDA images. For CUDA 10.0, nvidia-docker2 (v2. 1.0) or greater is recommended. It is also recommended to use Docker 19.03.

Can Containers share GPU?

Time-sharing allows a maximum of 48 containers to share a physical GPU whereas multi-instance GPUs on A100 allows up to a maximum of 7 partitions. If you want to maximize your GPU utilization, you can configure time-sharing for each multi-instance GPU partition.


1 Answers

For anybody arriving here looking how to do it with docker compose, add to your service:

deploy:
  resources:
    reservations:
      devices:
      - driver: nvidia
        capabilities:
          - gpu
          - utility # nvidia-smi
          - compute # CUDA. Required to avoid "CUDA version: N/A"
          - video   # NVDEC/NVENC. For instance to use a hardware accelerated ffmpeg. Skip it if you don't need it

Doc: https://docs.docker.com/compose/gpu-support

like image 107
GG. Avatar answered Sep 28 '22 10:09

GG.