I'm trying to run the example seq2seq by Tensorflow, but it won't use the GPU. Here are the step I took to install Tensorflow on a Linux system with Tesla K20x
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
./configure # Yes GPU
bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu # The GPU is being used)
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-0.5.0-cp27-none-linux_x86_64.whl
After all of this step, I have tensorflow installed. I then try to run the seq2seq example, by
bazel run -c opt //tutorials/models/rnn/translate:translate
but it will not use the GPU. I then try the example
bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
and it gives an error
bazel-bin/tensorflow/cc/tutorials_example_trainer: error while loading shared libraries: /path/to/home/.cache/bazel/_bazel_hduong/9e8a6e75473e7bf5c9d1c8a084e2a0e9/tensorflow/bazel-out/local_linux-opt/bin/tensorflow/cc/../../_solib_local/_U_S_Sthird_Uparty_Sgpus_Scuda_Ccudart___Uthird_Uparty_Sgpus_Scuda_Slib64/libcudart.so.7.0: file too short
Was wondering if anyone know what might cause the program to not use the GPU? Any help is appreciated.
Thank you.
If TensorFlow doesn't detect your GPU, it will default to the CPU, which means when doing heavy training jobs, these will take a really long time to complete. This is most likely because the CUDA and CuDNN drivers are not being correctly detected in your system.
If a TensorFlow operation has both CPU and GPU implementations, TensorFlow will automatically place the operation to run on a GPU device first. If you have more than one GPU, the GPU with the lowest ID will be selected by default. However, TensorFlow does not place operations into multiple GPUs automatically.
TensorFlow supports running computations on a variety of types of devices, including CPU and GPU. They are represented with string identifiers for example: "/device:CPU:0" : The CPU of your machine.
The issue looks to be that when you bazel run
the translation example, it rebuilds without GPU support. Try adding --config=cuda
to the bazel run
command, as follows:
$ bazel run -c opt --config=cuda //tensorflow/models/rnn/translate:translate
Without this option, Bazel will recompile the entire TensorFlow runtime without GPU support, and use this version when it runs the example application.
it occurs since the cuda is not properly linked. Enter the following command in the terminal
sudo ldconfig /usr/local/cuda/lib64
I’m guessing you should install it with a GPU version
pip install tensorflow-gpu
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With