I have installed (I think) TF with CUDA support using command pip3.6 install tensorflow-gpu, per TF installation page.
My local CUDA installation is CUDA 9.0 and CUDNN 7.3.1 in /usr/local/cuda-9.0.
Per tip https://github.com/tensorflow/tensorflow/issues/10827 I am checking libraries used by TF (in virtualenv):
% python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib() + "/python/_pywrap_tensorflow_internal.so")' | xargs ldd
linux-vdso.so.1 (0x00007fff57eb8000)
libtensorflow_framework.so => /home/mark/projects/bench/venvs/ve_tf/lib/python3.6/site-packages/tensorflow/python/../libtensorflow_framework.so (0x00007ff29fa25000)
libcublas.so.9.0 => /usr/local/cuda-9.0/lib64/libcublas.so.9.0 (0x00007ff29bda8000)
libcusolver.so.9.0 => /usr/local/cuda-9.0/lib64/libcusolver.so.9.0 (0x00007ff2971ad000)
libcudart.so.9.0 => /usr/local/cuda-9.0/lib64/libcudart.so.9.0 (0x00007ff296f40000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff296d3c000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff296b1f000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007ff2968f2000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff2965ee000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff2963e6000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff296064000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff295e4d000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff295aae000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff2cb7c7000)
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007ff294f0e000)
libcudnn.so.7 => /usr/local/cuda-9.0/lib64/libcudnn.so.7 (0x00007ff282bd5000)
libcufft.so.9.0 => /usr/local/cuda-9.0/lib64/libcufft.so.9.0 (0x00007ff27ab34000)
libcurand.so.9.0 => /usr/local/cuda-9.0/lib64/libcurand.so.9.0 (0x00007ff276bd0000)
libnvidia-fatbinaryloader.so.390.77 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.390.77 (0x00007ff276984000)
libcudnn.so seem to point to the right library, however, for libcuda.so I have doubts:
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007ff294f0e000)
% readlink -f /usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcuda.so.390.77
OK, so it apparently leads to CUDA library used/provided by NVIDIA device driver...
Is this normal? Should it not use libcuda.so in /usr/local/cuda-9.0?
I do have one: /usr/local/cuda-9.0/lib64/stubs/libcuda.so.
Yes, it's normal.
The libcuda used should definitely be the one provided (installed) by the GPU driver. It definitely should not be the one in the stubs directory.
The one in the stubs directory (or anything in the /usr/local/cuda... path) is there for a different purpose, basically having to do with application building in certain scenarios, not for running any applications.
For running applications (like Tensorflow), it's necessary to use the shared object provided by the driver, for the libcuda library.
(The libcuda.so in the stubs directory is provided for the scenario where you have a CUDA toolkit installed but no GPU driver installed, and you want to build GPU applications, but not run them, of course. Such a scenario could exist on a head node/login node in a compute cluster for example. In that scenario, the login node/build node may not have a GPU installed, but you may still want to build CUDA driver API applications. Such applications need to build i.e. link against the driver API library, and that library is provided by libcuda.so. Therefore, for this scenario, a "stub" library is provided. The "stub" library has everything needed for the API linking process, but is otherwise not functional.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With