Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TensorFlow in nvidia-docker: failed call to cuInit: CUDA_ERROR_UNKNOWN

I have been working on getting an application that relies on TensorFlow to work as a docker container with nvidia-docker. I have compiled my application on top of the tensorflow/tensorflow:latest-gpu-py3 image. I run my docker container with the following command:

sudo nvidia-docker run -d -p 9090:9090 -v /src/weights:/weights myname/myrepo:mylabel

When looking at the logs through portainer I see the following:

2017-05-16 03:41:47.715682: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715896: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715948: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.715978: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.716002: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-05-16 03:41:47.718076: E tensorflow/stream_executor/cuda/cuda_driver.cc:405] failed call to cuInit: CUDA_ERROR_UNKNOWN
2017-05-16 03:41:47.718177: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: 1e22bdaf82f1
2017-05-16 03:41:47.718216: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 1e22bdaf82f1
2017-05-16 03:41:47.718298: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 367.57.0
2017-05-16 03:41:47.718398: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  367.57  Mon Oct  3 20:37:01 PDT 2016
GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) 
"""
2017-05-16 03:41:47.718455: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 367.57.0
2017-05-16 03:41:47.718484: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 367.57.0

The container does seem to start properly, and my application does appear to be running. When I send requests to it for predictions the predictions are returned correctly - however at the slow speed I would expect when running inference on the CPU, so I think it's pretty clear that the GPU is not being used for some reason. I've also tried running nvidia-smi from within that same container to make sure it is seeing my GPU and these are the results for that:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K1             Off  | 0000:00:07.0     Off |                  N/A |
| N/A   28C    P8     7W /  31W |     25MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

I'm certainly no expert in this - but it does appear that the GPU is visible from inside the container. Any ideas on how to get this working with TensorFlow?

like image 840
abagshaw Avatar asked May 04 '26 12:05

abagshaw


1 Answers

I tried installing nvidia-modrpobe, but still the same error. Then a simple system reboot worked for me

like image 141
vinita Avatar answered May 06 '26 03:05

vinita



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!