During training of my data, my GPU utilization is around 40%, and I clearly see that there is a datacopy operation that's using a lot of time, based on tensorflow profiler(see attached picture). I presume that "MEMCPYHtoD" option is copying the batch from CPU to GPU, and is blocking the GPU from being used. Is there anyway to prefetch data to GPU? or is there other problems that I am not seeing?
Here is the code for dataset:
X_placeholder = tf.placeholder(tf.float32, data.train.X.shape)
y_placeholder = tf.placeholder(tf.float32, data.train.y[label].shape)
dataset = tf.data.Dataset.from_tensor_slices({"X": X_placeholder,
"y": y_placeholder})
dataset = dataset.repeat(1000)
dataset = dataset.batch(1000)
dataset = dataset.prefetch(2)
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
TensorFlow code, and tf. keras models will transparently run on a single GPU with no code changes required. Note: Use tf. config.
The GPU-enabled version of TensorFlow has the following requirements: 64-bit Linux. Python 2.7. CUDA 7.5 (CUDA 8.0 required for Pascal GPUs)
If a TensorFlow operation has both CPU and GPU implementations, TensorFlow will automatically place the operation to run on a GPU device first. If you have more than one GPU, the GPU with the lowest ID will be selected by default. However, TensorFlow does not place operations into multiple GPUs automatically.
Prefetching to a single GPU:
prefetch_to_device
, e.g. by explicitly copying to the GPU with tf.data.experimental.copy_to_device(...)
and then prefetching. This allows to avoid the restriction that prefetch_to_device
must be the last transformation in a pipeline, and allow to incorporate further tricks to optimize the Dataset
pipeline performance (e.g. by overriding threadpool distribution).tf.contrib.data.AUTOTUNE
option for prefetching, which allows the tf.data
runtime to automatically tune the prefetch buffer sizes based on your system and environment.At the end, you might end up doing something like this:
dataset = dataset.apply(tf.data.experimental.copy_to_device("/gpu:0"))
dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With