Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot run tensorflow on GPU

I want to run tensorflow code on my GPU but its not working. I have Cuda and cuDNN installed and have a compatible GPU as well.

I took this sample from the official website tutorial for GPUs here Tensorflow tutorial for GPU

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

Here is my output of it:

Device mapping: no known devices.
2017-10-31 16:15:40.298845: I tensorflow/core/common_runtime/direct_session.cc:300] Device mapping:

MatMul: (MatMul): /job:localhost/replica:0/task:0/cpu:0
2017-10-31 16:15:56.895802: I tensorflow/core/common_runtime/simple_placer.cc:872] MatMul: (MatMul)/job:localhost/replica:0/task:0/cpu:0
b: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-10-31 16:15:56.895910: I tensorflow/core/common_runtime/simple_placer.cc:872] b: (Const)/job:localhost/replica:0/task:0/cpu:0
a_1: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-10-31 16:15:56.895961: I tensorflow/core/common_runtime/simple_placer.cc:872] a_1: (Const)/job:localhost/replica:0/task:0/cpu:0
a: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-10-31 16:15:56.896006: I tensorflow/core/common_runtime/simple_placer.cc:872] a: (Const)/job:localhost/replica:0/task:0/cpu:0
[[ 22.  28.]
 [ 49.  64.]]

There is no option for running on my GPU. I tried to force it to run on GPU manually using this:

with tf.device('/gpu:0'):
...

It gave a bunch of errors:

Traceback (most recent call last):
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
    return fn(*args)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1297, in _run_fn
    self._extend_graph()
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1358, in _extend_graph
    self._session, graph_def.SerializeToString(), status)
  File "/home/abhor/anaconda3/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'MatMul_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:GPU:0"](a_2, b_1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'MatMul_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:GPU:0"](a_2, b_1)]]

Caused by op 'MatMul_1', defined at:
  File "<stdin>", line 4, in <module>
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1844, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1289, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/abhor/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'MatMul_1': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:GPU:0"](a_2, b_1)]]

I see that in some lines it says only CPU is available.

Here are my graphic card details and Cuda versions.

Output for nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 940MX       Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   43C    P0    N/A /  N/A |    274MiB /  2002MiB |     10%      Default |
+-------------------------------+----------------------+----------------------+

Output for nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

I don't know how to check for cuDNN, but I installed it the way it was given in the official documentation, so I am guessing it should be working as well.

EDIT: Output for pip3 list | grep tensorflow

tensorflow-gpu (1.3.0)
tensorflow-tensorboard (0.1.8)
like image 683
Abhor Avatar asked Oct 31 '17 17:10

Abhor


People also ask

Why GPU is not working in TensorFlow?

This is most likely because the CUDA and CuDNN drivers are not being correctly detected in your system. In both cases, Tensorflow is not detecting your Nvidia GPU. This can be for a variety of reasons: Nvidia Driver not installed.

Can I run TensorFlow with GPU?

TensorFlow supports running computations on a variety of types of devices, including CPU and GPU.

Does TensorFlow 2.0 support GPU?

Hardware requirements. Note: TensorFlow binaries use AVX instructions which may not run on older CPUs. The following GPU-enabled devices are supported: NVIDIA® GPU card with CUDA® architectures 3.5, 5.0, 6.0, 7.0, 7.5, 8.0 and higher.


Video Answer


2 Answers

Try this piece of code:

sess = tf.Session(config=tf.ConfigProto(
      allow_soft_placement=True, log_device_placement=True))
like image 174
Luís Carlos Silva Eiras Avatar answered Oct 22 '22 10:10

Luís Carlos Silva Eiras


Actually tensorflow cannot find the CUDA GPU in your situation.

Refer to the output device list there:

Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]

This means no GPU is found. You can referring to codes here from How to get current available GPUs in tensorflow?, to list GPU (which tensorflow actually can find).

from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']

You must make sure actually found gpu/s is returned, thus tensorflow can use the gpu device.

There are many possibilities that gpu cannot be found, including but not limited, CUDA installation/settings, tensorflow versions and GPU model especially the GPU compute capability. Must checkout the tensorflow version support for a certain GPU model, and must checkout the GPU capability (for NVidia GPUs).

like image 23
Kelly Hwong Avatar answered Oct 22 '22 09:10

Kelly Hwong