Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GPU under utilization using tensorflow dataset

TF Trace

During training of my data, my GPU utilization is around 40%, and I clearly see that there is a datacopy operation that's using a lot of time, based on tensorflow profiler(see attached picture). I presume that "MEMCPYHtoD" option is copying the batch from CPU to GPU, and is blocking the GPU from being used. Is there anyway to prefetch data to GPU? or is there other problems that I am not seeing?

Here is the code for dataset:

X_placeholder = tf.placeholder(tf.float32, data.train.X.shape)
y_placeholder = tf.placeholder(tf.float32, data.train.y[label].shape)

dataset = tf.data.Dataset.from_tensor_slices({"X": X_placeholder, 
                                              "y": y_placeholder})
dataset = dataset.repeat(1000)
dataset = dataset.batch(1000)
dataset = dataset.prefetch(2)
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
like image 222
Molly Zhang Avatar asked Jan 20 '18 01:01

Molly Zhang


People also ask

Does TF data use GPU?

TensorFlow code, and tf. keras models will transparently run on a single GPU with no code changes required. Note: Use tf. config.

How much GPU is required for TensorFlow?

The GPU-enabled version of TensorFlow has the following requirements: 64-bit Linux. Python 2.7. CUDA 7.5 (CUDA 8.0 required for Pascal GPUs)

Does TensorFlow automatically detect GPU?

If a TensorFlow operation has both CPU and GPU implementations, TensorFlow will automatically place the operation to run on a GPU device first. If you have more than one GPU, the GPU with the lowest ID will be selected by default. However, TensorFlow does not place operations into multiple GPUs automatically.


1 Answers

Prefetching to a single GPU:

  • Consider using a more flexible approach than prefetch_to_device, e.g. by explicitly copying to the GPU with tf.data.experimental.copy_to_device(...) and then prefetching. This allows to avoid the restriction that prefetch_to_device must be the last transformation in a pipeline, and allow to incorporate further tricks to optimize the Dataset pipeline performance (e.g. by overriding threadpool distribution).
  • Try out the experimental tf.contrib.data.AUTOTUNE option for prefetching, which allows the tf.data runtime to automatically tune the prefetch buffer sizes based on your system and environment.

At the end, you might end up doing something like this:

dataset = dataset.apply(tf.data.experimental.copy_to_device("/gpu:0"))
dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)
like image 82
Alaroff Avatar answered Oct 02 '22 10:10

Alaroff