Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot dlopen some GPU libraries. Skipping registering GPU devices

Tensorflow is only using the CPU and wont use the GPU. I assume its because it expects Cuda 10.0 and it finds 10.2.

I had installed 10.2 but have purged it and installed 10.0.

Im running Ubuntu 19.10, AMD Ryzen 2700 Cpu, RTX 2080 S. I have installed the 440 Nvidia driver, It says cuda version 10.2 when i check with nvidia-smi and nvcc -version.

From pip3: tensorflow-gpu           1.14.0
           tensorflow-datasets       2.0.0               
           tensorflow-estimator     1.14.0                           
           tensorflow-metadata      0.21.1 

From Nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44       Driver Version: 440.44       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:08:00.0  On |                  N/A |
|  0%   48C    P8    13W / 250W |    369MiB /  7979MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1110      G   /usr/lib/xorg/Xorg                            18MiB |
|    0      1611      G   /usr/lib/xorg/Xorg                            73MiB |
|    0      1816      G   /usr/bin/gnome-shell                         108MiB |
|    0      2655      C   python3                                      115MiB |
+-----------------------------------------------------------------------------+

from nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

But when i check the version.txt i get 10.0.130

cat /usr/local/cuda/version.txt 
CUDA Version 10.0.130

I check the devices with :

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

result:

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 4810338588393992961
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 7271419476897292826
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 4332706623198547606
physical_device_desc: "device: XLA_GPU device"
]

How do i register the 10.0.130 Is that the reason why it wont run on GPU? Its super slow on the 8 Core CPU. Any advice?

Here is the log:

2020-02-13 14:11:31.411277: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-13 14:11:31.440150: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3193485000 Hz
2020-02-13 14:11:31.441076: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5625b689c790 executing computations on platform Host. Devices:
2020-02-13 14:11:31.441123: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2020-02-13 14:11:31.443001: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-02-13 14:11:31.472935: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 14:11:31.473407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce RTX 2080 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.845
pciBusID: 0000:08:00.0
2020-02-13 14:11:31.474361: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-02-13 14:11:31.487124: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-02-13 14:11:31.496148: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-02-13 14:11:31.498873: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-02-13 14:11:31.514842: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-02-13 14:11:31.525992: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-02-13 14:11:31.526168: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/lib64
2020-02-13 14:11:31.526183: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-02-13 14:11:31.618627: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-13 14:11:31.618655: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2020-02-13 14:11:31.618662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2020-02-13 14:11:31.620367: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

2020-02-13 14:11:31.621395: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5625b732d5f0 executing computations on platform CUDA. Devices:
2020-02-13 14:11:31.621407: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080 SUPER, Compute Capability 7.5
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13330791690361361129
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 11872341970779952422
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 15007819717683015571
physical_device_desc: "device: XLA_GPU device"
]
WARNING:tensorflow:From pokeGAN.py:172: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From pokeGAN.py:174: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From pokeGAN.py:77: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.


2020-02-13 14:11:33.799163: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 14:11:33.799597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce RTX 2080 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.845
pciBusID: 0000:08:00.0
2020-02-13 14:11:33.799646: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-02-13 14:11:33.799658: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-02-13 14:11:33.799669: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-02-13 14:11:33.799684: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-02-13 14:11:33.799695: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-02-13 14:11:33.799706: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-02-13 14:11:33.799777: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/lib64
2020-02-13 14:11:33.799786: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-02-13 14:11:33.800016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-13 14:11:33.800028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      
WARNING:tensorflow:From pokeGAN.py:203: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

2020-02-13 14:11:34.197990: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
WARNING:tensorflow:From /home/node/.local/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
WARNING:tensorflow:From pokeGAN.py:211: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
total training sample num:91
batch size: 64, batch num per epoch: 1, epoch num: 5000
start training...
like image 870
dev Avatar asked Feb 13 '20 13:02

dev


2 Answers

Judging from your logs it looks like tensorflow finds the correct cuda version but the cudnn library is missing.

2020-02-13 14:11:31.474361: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-02-13 14:11:31.526168: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/lib64

Have you installed the correct version of cudnn? As you can see here tensorflow 1.14 also requires cudnn 7.4

like image 192
idinu Avatar answered Nov 07 '22 03:11

idinu


The only thing that worked for me to solve this issue was to completely remove CUDA and reinstall it again.

like image 22
Hagbard Avatar answered Nov 07 '22 04:11

Hagbard