I have a container that loads a Pytorch model. Every time I try to start it up, I get this error:
Traceback (most recent call last):
File "server/start.py", line 166, in <module>
start()
File "server/start.py", line 94, in start
app.register_blueprint(create_api(), url_prefix="/api/1")
File "/usr/local/src/skiff/app/server/server/api.py", line 30, in create_api
atomic_demo_model = DemoModel(model_filepath, comet_dir)
File "/usr/local/src/comet/comet/comet/interactive/atomic_demo.py", line 69, in __init__
model = interactive.make_model(opt, n_vocab, n_ctx, state_dict)
File "/usr/local/src/comet/comet/comet/interactive/functions.py", line 98, in make_model
model.to(cfg.device)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 381, in to
return self._apply(convert)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 187, in _apply
module._apply(fn)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 187, in _apply
module._apply(fn)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 193, in _apply
param.data = fn(param.data)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 379, in convert
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
File "/usr/local/lib/python3.7/site-packages/torch/cuda/__init__.py", line 161, in _lazy_init
_check_driver()
File "/usr/local/lib/python3.7/site-packages/torch/cuda/__init__.py", line 82, in _check_driver
http://www.nvidia.com/Download/index.aspx""")
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
I know that nvidia-docker2
is working.
$ docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Tue Jul 16 22:09:40 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:1A:00.0 Off | N/A |
| 0% 44C P0 72W / 260W | 0MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:1B:00.0 Off | N/A |
| 0% 44C P0 66W / 260W | 0MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 208... Off | 00000000:1E:00.0 Off | N/A |
| 0% 44C P0 48W / 260W | 0MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce RTX 208... Off | 00000000:3E:00.0 Off | N/A |
| 0% 41C P0 54W / 260W | 0MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce RTX 208... Off | 00000000:3F:00.0 Off | N/A |
| 0% 42C P0 48W / 260W | 0MiB / 10989MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce RTX 208... Off | 00000000:41:00.0 Off | N/A |
| 0% 42C P0 1W / 260W | 0MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
However, I keep getting the error above.
I've tried the following:
Setting "default-runtime": nvidia
in /etc/docker/daemon.json
Using docker run --runtime=nvidia <IMAGE_ID>
Adding the variables below to my Dockerfile:
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
LABEL com.nvidia.volumes.needed="nvidia_driver"
I expect this container to run - we have a working version in production without these issues. And I know that Docker can find the drivers, as the output above shows. Any ideas?
With the release of Docker 19.03, usage of nvidia-docker2 packages is deprecated since NVIDIA GPUs are now natively supported as devices in the Docker runtime. For first-time users of Docker 20.10 and GPUs, continue with the instructions for getting started below.
Key points: Linux containers support all graphics APIs for NVIDIA GPUs using the NVIDIA Container Toolkit. Windows containers support DirectX-based graphics APIs for GPUs from all vendors using native hardware acceleration support.
I got the same error. After trying number of solutions I found the below
docker run -ti --runtime=nvidia -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all <image_name>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With