Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow 2.3 and libcublas.so.10

According to official documentation, in Tensorflow 2.3 CUDA 10.1 is supported

I have Ubuntu 20.04, GPU onboard, CUDA 10.1 and CUDNN 7.6

I am getting the error when start using Tensorflow (2.3): Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64

After some hours of investigations, it turned out that CUBLAS packaging changed in CUDA 10.1 to be outside of the toolkit installation path

/usr/local/cuda-10.1/lib64

See here : https://forums.developer.nvidia.com/t/cublas-for-10-1-is-missing/71015/16

In my case I searched with

sudo find /usr -name libcublas*

and founded :

            /usr/share/doc/libcublas-dev
            /usr/share/doc/libcublas10
            /usr/local/cuda-10.1/doc/man/man7/libcublas.so.7
            /usr/local/cuda-10.1/doc/man/man7/libcublas.7
            /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas.so.10.2.2.214
            /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublasLt.so.10.2.2.214
            /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcublasLt.so
            /usr/local/cuda-10.2/targets/x86_64-linux/lib/stubs/libcublas.so
            /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas.so.10
            /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas_static.a
            /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublasLt.so
            /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublasLt_static.a
            /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublasLt.so.10
            /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas.so

Then, following some suggestions for workarounds using symlink (founded in the nvdia site), I created a symlink for the files above founded, to the :

sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas.so /usr/local/cuda-10.1/lib64/libcublas.so
sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas.so.10 /usr/local/cuda-10.1/lib64/libcublas.so.10

Even after the symlinks, the error persists:

Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64

with my nano ~/.profile containing :

# set PATH for cuda 10.1 installation
if [ -d "/usr/local/cuda-10.1/bin/" ]; then
    export PATH=/usr/local/cuda-10.1/bin${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
fi

I wanted also to try and take manually the files from the toolkit available in the cuda_10.1.168_418.67_linux.run file as suggested here but have founded that the suggested command does not work.I adjusted to command :

sh cuda_10.1.168_418.67_linux.run --extract=/extracted

which goes KO when finalizing with a message .. Failed to verify gcc version. See log at /tmp/cuda-installer.log for details.

If only that extraction could work, maybe a manual copy of the files saves the full headache.

It seems that this cublas step is not documented in Tensorflow official documentation for installing with CUDA 10.1

Any idea ?

like image 634
Lu_Perr Avatar asked Sep 24 '20 17:09

Lu_Perr


2 Answers

I had the same problem, solved thank to your question. I used the symlink aproach but adding an extra one for libcublasLt.so.10.

$ sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas.so /usr/local/cuda-10.1/lib64/libcublas.so
$ sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas.so.10 /usr/local/cuda-10.1/lib64/libcublas.so.10
$ sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublasLt.so.10 /usr/local/cuda-10.1/lib64/libcublasLt.so.10
like image 122
Pedrolarben Avatar answered Nov 15 '22 00:11

Pedrolarben


sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas.so /usr/local/cuda-10.1/lib64/libcublas.so
sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas.so.10 /usr/local/cuda-10.1/lib64/libcublas.so.10

these are not appropriate to resolve this problem.
we should re-install packages related with libcublas for CUDA 10.1.

I've re-installed with apt.

  • my env. based on CUDA repos by NVIDIA.
$ sudo apt install --reinstall libcublas10=10.2.1.243-1 libcublas-dev=10.2.1.243-1

then libcublas libraries are moved into /usr/local/cuda-10.1/.

and preventing that appear upgradable candidate.

$ sudo apt-mark hold libcublas10
$ sudo apt-mark hold libcublas-dev
like image 44
nobilearn Avatar answered Nov 15 '22 01:11

nobilearn