Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tensorflow transition to gpu version

i've worked with tensorflow for a while and everything worked properly until i tried to switch to the gpu version.

Uninstalled previous tensorflow, pip installed tensorflow-gpu (v2.0) downloaded and installed visual studio community 2019 downloaded and installed CUDA 10.1 downloaded and installed cuDNN

tested with CUDA sample "deviceQuery_vs2019" and got positive result. test passed Nvidia GeForce rtx 2070

run test with previous working file and get the error tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: cudaGetErrorString symbol not found.

after some research i've found that the supported CUDA version is 10.0 so i've downgraded the version, changed the CUDA path, but nothing changed

using this code


import tensorflow as tf
print("Num GPUs Available: ", 
len(tf.config.experimental.list_physical_devices('GPU')))

i get

2019-10-01 16:55:03.317232: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-10-01 16:55:03.420537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
Num GPUs Available:  1
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2019-10-01 16:55:03.421029: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-10-01 16:55:03.421849: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
[Finished in 2.01s]

CUDA seems to recognize the card, so does tensorflow, but i cannot get rid of the error: tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: cudaGetErrorString symbol not found.

what am i doing wrong? should i stick with cuda 10.0? am i missing a piece of the installation?

like image 415
m4l4 Avatar asked Oct 01 '19 15:10

m4l4


People also ask

Does TensorFlow 2.0 support GPU?

TensorFlow supports running computations on a variety of types of devices, including CPU and GPU. They are represented with string identifiers for example: "/device:CPU:0" : The CPU of your machine.

Does TensorFlow 1.15 support GPU?

As announced Tensorflow 1.15 contained GPU support by default.

Which Python version is compatible with TensorFlow GPU?

TensorFlow is tested and supported on the following 64-bit systems: Python 3.7–3.10. Ubuntu 16.04 or later. Windows 7 or later (with C++ redistributable)

How do I find my GPU TensorFlow version?

Check your TensorFlow versionType python -c "import tensorflow as tf;print(tf. __version__)" in your command shell and it should output the version number if you have installed the TensorFlow. Mine is 2.7. 0.


1 Answers

SOLVED, it's mostly an alchemy of versions to avoid conflicts. Here's what i've done (order matters as far as i know)

  1. uninstall everything (tf, cuda, visual studio)
  2. pip install tensorflow-gpu
  3. download and install visual studio community 2017 (2019 won't work)
  4. I also have installed the c++ workload from visual studio (not sure if it's necessary but it has the required compiler visual c++ 15.x)
  5. download and install cuda 10.0 (the one i have is 10.0.130)
  6. go to system environment variables (search it in the windows bar) > advanced > click Environment Variables...
  7. create New user variables (do not confuse with system var)
  8. Variable name: CUDA_PATH,
  9. Variable value: browse to the cuda directory down to the version directory (mine is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0)
  10. the guide says you need cudnn 7.4.1, but i got an error about expected version being 7.6 minimum. go to the nvidia developers cudnn archive and download "cudnn v7.6.0 for CUDA 10.0" (be sure you get the right file). unzip, put the cudnn files into the corresponding cuda directories (lib, include, bin).

From there everything worked like a charm. I haven't been able to build the cuda sample file from visual studio (devicequery) but it's not a vital step. Almost every error was due to incompatible versions of the files, took me 3-4 days to figure the right mix. Hope that help :)

like image 168
m4l4 Avatar answered Sep 30 '22 07:09

m4l4