Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error using Tensorflow with GPU

I've tried a bunch of different Tensorflow examples, which works fine on the CPU but generates the same error when I'm trying to run them on the GPU. One little example is this:

import tensorflow as tf

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)

The error is always the same, CUDA_ERROR_OUT_OF_MEMORY:

I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 24
I tensorflow/core/common_runtime/gpu/gpu_init.cc:103] Found device 0 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:0a:00.0
Total memory: 11.25GiB
Free memory: 105.73MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:103] Found device 1 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:0b:00.0
Total memory: 11.25GiB
Free memory: 133.48MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:127] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 0:   Y Y 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 1:   Y Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:0a:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:702] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:0b:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 105.48MiB bytes.
E tensorflow/stream_executor/cuda/cuda_driver.cc:932] failed to allocate 105.48M (110608384 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
F tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Check failed: gpu_mem != nullptr  Could not allocate GPU device memory for device 0. Tried to allocate 105.48MiB
Aborted (core dumped)

I guess that the problem has to do with my configuration rather than the memory usage of this tiny example. Does anyone have any idea?

Edit:

I've found out that the problem may be as simple as someone else running a job on the same GPU, which would explain the little amount of free memory. In that case: sorry for taking up your time...

like image 931
user5654767 Avatar asked Dec 29 '15 15:12

user5654767


People also ask

Why is TensorFlow not using my GPU?

If TensorFlow doesn't detect your GPU, it will default to the CPU, which means when doing heavy training jobs, these will take a really long time to complete. This is most likely because the CUDA and CuDNN drivers are not being correctly detected in your system.

How do I fix TensorFlow error?

If the "No module named 'tensorflow'" error persists, try restarting your IDE and development server / script. You can check if you have the tensorflow package installed by running the pip show tensorflow command. Copied!

Does TensorFlow 2.0 support GPU?

TensorFlow supports running computations on a variety of types of devices, including CPU and GPU. They are represented with string identifiers for example: "/device:CPU:0" : The CPU of your machine.


2 Answers

There appear to be two issues here:

  1. By default, TensorFlow allocates a large fraction (95%) of the available GPU memory (on each GPU device) when you create a tf.Session. It uses a heuristic that reserves 200MB of GPU memory for "system" uses, but doesn't set this aside if the amount of free memory is smaller than that.

  2. It looks like you have very little free GPU memory on either of your GPU devices (105.73MiB and 133.48MiB). This means that TensorFlow will attempt to allocate memory that should probably be reserved for the system, and hence the allocation fails.

Is it possible that you have another TensorFlow process (or some other GPU-hungry code) running while you attempt to run this program? For example, a Python interpreter with an open session—even if it is not using the GPU—will attempt to allocate almost the entire GPU memory.

Currently, the only way to restrict the amount of GPU memory that TensorFlow uses is the following configuration option (from this question):

# Assume that you have 12GB of GPU memory and want to allocate ~4GB:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)

sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
like image 109
mrry Avatar answered Oct 26 '22 13:10

mrry


This can happen because your TensorFlow session is not able to get sufficient amount of memory in the GPU. Maybe you have a low amount of free memory for other processes like TensorFlow or there is another TensorFlow session running in your system . so you have to configure the amount of memory the TensorFlow session will use

if you are using TensorFlow 1.x

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)

sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

as Tensorflow 2.x has undergone major changes from 1.x.if you want to use TensorFlow 1.x versions method/function there is a compatibility module kept in TensorFlow 2.x. So TensorFlow 2.x user can use this piece of code

gpu_options = tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.333)

sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(gpu_options=gpu_options))
like image 23
ujjal das Avatar answered Oct 26 '22 12:10

ujjal das