How to get Docker to recognize NVIDIA drivers?

Tags:

I have a container that loads a Pytorch model. Every time I try to start it up, I get this error:

Traceback (most recent call last):
  File "server/start.py", line 166, in <module>
    start()
  File "server/start.py", line 94, in start
    app.register_blueprint(create_api(), url_prefix="/api/1")
  File "/usr/local/src/skiff/app/server/server/api.py", line 30, in create_api
    atomic_demo_model = DemoModel(model_filepath, comet_dir)
  File "/usr/local/src/comet/comet/comet/interactive/atomic_demo.py", line 69, in __init__
    model = interactive.make_model(opt, n_vocab, n_ctx, state_dict)
  File "/usr/local/src/comet/comet/comet/interactive/functions.py", line 98, in make_model
    model.to(cfg.device)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 381, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 193, in _apply
    param.data = fn(param.data)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 379, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
  File "/usr/local/lib/python3.7/site-packages/torch/cuda/__init__.py", line 161, in _lazy_init
    _check_driver()
  File "/usr/local/lib/python3.7/site-packages/torch/cuda/__init__.py", line 82, in _check_driver
    http://www.nvidia.com/Download/index.aspx""")
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

I know that nvidia-docker2 is working.

$ docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Tue Jul 16 22:09:40 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:1A:00.0 Off |                  N/A |
|  0%   44C    P0    72W / 260W |      0MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:1B:00.0 Off |                  N/A |
|  0%   44C    P0    66W / 260W |      0MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 208...  Off  | 00000000:1E:00.0 Off |                  N/A |
|  0%   44C    P0    48W / 260W |      0MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce RTX 208...  Off  | 00000000:3E:00.0 Off |                  N/A |
|  0%   41C    P0    54W / 260W |      0MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce RTX 208...  Off  | 00000000:3F:00.0 Off |                  N/A |
|  0%   42C    P0    48W / 260W |      0MiB / 10989MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
|   5  GeForce RTX 208...  Off  | 00000000:41:00.0 Off |                  N/A |
|  0%   42C    P0     1W / 260W |      0MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

However, I keep getting the error above.

I've tried the following:

Setting "default-runtime": nvidia in /etc/docker/daemon.json
Using docker run --runtime=nvidia <IMAGE_ID>
Adding the variables below to my Dockerfile:

ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
LABEL com.nvidia.volumes.needed="nvidia_driver"

I expect this container to run - we have a working version in production without these issues. And I know that Docker can find the drivers, as the output above shows. Any ideas?

781

asked Jul 16 '19 22:07

infinitely_improbable

1 Answers

I got the same error. After trying number of solutions I found the below

docker run -ti --runtime=nvidia -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all <image_name>

111

answered Oct 10 '22 16:10

chirag

Related questions
                            
                                How to run GPGPU inside docker image with different from host kernel and GPU driver version
                            
                                Unable to create a tarball: archive/tar: write too long
                            
                                Error 'import path does not begin with hostname' when building docker with local package
                            
                                Bottleneck when using auth/admin/realms/myrealm/users in my app
                            
                                How to solve i/o timeout error in docker pull
                            
                                Using Python linters with Docker in VS Code
                            
                                Attempting to access a USB device from Docker in Windows
                            
                                How to push an image to a docker registry using the docker Registry API v2
                            
                                dotnet restore incredibly slow inside docker-compose build
                            
                                Where is ${DOCKER_REGISTRY-} being set
                            
                                Is it possible to access a hardware device with a Docker image under Windows?
                            
                                Install PIP3 and PYTHON3.7 on Docker Ubuntu 18.04
                            
                                Docker scale with deterministic port binding
                            
                                How can I access a service running on WSL2 from inside a Docker container?
                            
                                Running tests as usual against docker containers or dockerize tests?
                            
                                psycopg2 cannot connect to docker image
                            
                                IPC shared memory across Python scripts in separate Docker containers
                            
                                GitHub Actions: build outside vs inside container?
                            
                                How to properly deal with .env files in docker
                            
                                Docker not recognizing Postgresql data directory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get Docker to recognize NVIDIA drivers?

Tags:

docker

docker-compose

ubuntu-18.04

pytorch

nvidia-docker

infinitely_improbable

People also ask

1 Answers

chirag

Recent Activity

Donate For Us