Using GPU inside docker container - CUDA Version: N/A and torch.cuda.is_available returns False

Tags:

I'm trying to use GPU from inside my docker container. I'm using docker with version 19.03 on Ubuntu 18.04.

Outside the docker container if I run nvidia-smi I get the below output.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05    Driver Version: 450.51.05    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   30C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

If I run the samething inside the container created from nvidia/cuda docker image, I get the same output as above and everything is running smoothly. torch.cuda.is_available() returns True.

But If I run the same nvidia-smi command inside any other docker container, it gives the following output where you can see that the CUDA Version is coming as N/A. Inside the containers torch.cuda.is_available() also returns False.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05    Driver Version: 450.51.05    CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   30C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I have installed nvidia-container-toolkit using the following commands.

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu18.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install nvidia-container-toolkit
sudo systemctl restart docker

I started my containers using the following commands

sudo docker run --rm --gpus all nvidia/cuda nvidia-smi
sudo docker run -it --rm --gpus all ubuntu nvidia-smi

421

asked Sep 05 '20 08:09

Sai Chander

1 Answers

For anybody arriving here looking how to do it with docker compose, add to your service:

deploy:
  resources:
    reservations:
      devices:
      - driver: nvidia
        capabilities:
          - gpu
          - utility # nvidia-smi
          - compute # CUDA. Required to avoid "CUDA version: N/A"
          - video   # NVDEC/NVENC. For instance to use a hardware accelerated ffmpeg. Skip it if you don't need it

Doc: https://docs.docker.com/compose/gpu-support

107

answered Sep 28 '22 10:09

GG.

Related questions
                            
                                Recommended GCE service account authentication inside Docker container?
                            
                                How to set a specific fixed IP address when I create a docker machine or container?
                            
                                How do I add a container user to a user group in the Docker host?
                            
                                Gitlab CI runner configuration with cache on docker
                            
                                Docker container disappears without notice
                            
                                Python logging class in Docker: logs gone
                            
                                Should nginx be packed into the same container as Django when deploying with Docker Swarm?
                            
                                How do I capture the console output for a container launched on ECS?
                            
                                Slow django model instance creation with Docker
                            
                                Accessing docker container mysql databases
                            
                                Why is docker looking in /simple for python packages?
                            
                                error parsing HTTP 404 response body: invalid character '<' looking for beginning of value docker
                            
                                Docker: bind `/uploads` directory to Amazon S3 Storage
                            
                                privileged mode in docker compose in a swarm
                            
                                How to link frontend and backend docker containers
                            
                                Copy files from Container to host using Dockerfile
                            
                                Jenkins (in a Docker container) - npm install fails because of ... npm WARN tar ENOENT: no such file or directory, futime
                            
                                Connection refused on API request between containers with docker compose
                            
                                Is there a command to find out the base image of a Docker image?
                            
                                docker pgAdmin4 connection refused while connecting local postgres database

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using GPU inside docker container - CUDA Version: N/A and torch.cuda.is_available returns False

Tags:

docker

docker-compose

cuda

pytorch

nvidia-docker

Sai Chander

People also ask

1 Answers

GG.

Recent Activity

Donate For Us