I get this error with a pytorch import python -c "import torch"
:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/__init__.py", line 13, in <module>
import torch
File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/__init__.py", line 191, in <module>
_load_global_deps()
File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/__init__.py", line 153, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/ctypes/__init__.py", line 382, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
how does one fix it?
related:
Like eval said, it is because pytorch1.13 automatically install nvidia_cublas_cu11, nvidia_cuda_nvrtc_cu11, nvidia_cuda_runtime_cu11 and nvidia_cudnn_cu11. While I have my own CUDA toolKit already installed, I have the same problem.
In my case, I used pip uninstall nvidia_cublas_cu11
and solved the problem.
I think the PyTorch team should solve this issue, since users often have their own CUDAtoolkit installed.
The error is from dlopen libcublas.so from .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/
, which is the pip package "nvidia-cuda-runtime" install location.
libcublasLt.so.11
is dynamically linked to libcublas.so.11
. The problem is that when you have a different cuda runtime installation (usually in /usr/local/cuda), dlopen probably gets the wrong one. You can run ldd .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/libcublas.so
to check the actual path of libcublasLt.so.11
, which is supposed to be the one under .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/
Workarounds:
Set env LD_LIBRARY_PATH=.../python3.9/site-packages/torch/lib/nvidia/cublas/lib/:$LD_LIBRARY_PATH
when launching python. So that dlopen can firstly look for .so files in that directory.
Using older torch. It was since 1.13.0 torch pip install started using pip nvidia-* packages. Before that cuda libs are statically linked. That's why older torch pip install has no problem even if you have existing cuda install.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With