Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cuda issue in TensorFlow 1.0 tutorial looks like TF can't find CUPTI/lib64?

This question has nothing to do with the warnings SSE AVX etc.. I've included the output for completeness. The issue is the fail on some cuda libs, I think, at the end, the machine has a NVIDA 1070 card and has the Cuda libs that are used earlier in the process but something is missing at the end? I pip installed release 1.0 of TensorFlow I also downloaded the repo separately to get the most up to date tutorials. This tutorial specifically to get instances of all of Tensorboard capabilities.. Attempting to run the Minst_with_summaries.py from the tensorFlow tutorials in the repo (I copied the file out of the repo into a working directory) and I'm using Anaconda and Python 3.6 I get the following:

(py36) tom@tomServal:~/Documents/LearningRepos/Working$ python Minst_with_summaries.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.645
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.48GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
Accuracy at step 0: 0.1213
Accuracy at step 10: 0.6962
Accuracy at step 20: 0.8054
Accuracy at step 30: 0.8447
Accuracy at step 40: 0.8718
Accuracy at step 50: 0.8779
Accuracy at step 60: 0.8846
Accuracy at step 70: 0.8783
Accuracy at step 80: 0.8853
Accuracy at step 90: 0.8989
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcupti.so.8.0. LD_LIBRARY_PATH: :/usr/local/cuda/lib64
F tensorflow/core/platform/default/gpu/cupti_wrapper.cc:59] Check failed: ::tensorflow::Status::OK() == (::tensorflow::Env::Default()->GetSymbolFromLibrary( GetDsoHandle(), kName, &f)) (OK vs. Not found: /home/tom/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: cuptiActivityRegisterCallbacks)could not find cuptiActivityRegisterCallbacksin libcupti DSO
Aborted

Looks to me that the installation of TensorFlow may be missing some stuff See that last several lines above? How to fix? Also reference this issue on GitHub: https://github.com/tensorflow/tensorflow/issues/7975

The answer was posted on GitHub and it seems there is an install bug that can be fixed by:

adding /usr/local/cuda/extras/CUPTI/lib64 to your LD_LIBRARY_PATH

Would be helpful if @mrry would reopen so others can see the correct resolution.

like image 619
dartdog Avatar asked Oct 17 '22 16:10

dartdog


1 Answers

Also reference this issue on GitHub: https://github.com/tensorflow/tensorflow/issues/7975

You may try the apt-get install that the git-hub issue suggests but that did not do it for me: This did:

The answer was posted on GitHub and it seems there is an install bug that can be fixed by:

adding /usr/local/cuda/extras/CUPTI/lib64 to your LD_LIBRARY_PATH

you can do that by editing your .bash profile

like image 88
dartdog Avatar answered Oct 21 '22 03:10

dartdog