As far as I'm aware there is still no asynchronous prefetching of data between CPU/GPU in tensorflow 1.4.
https://github.com/tensorflow/tensorflow/issues/5722
I am trying to code this functionality myself as an exercise in understanding.
The following code attempts to implement this process:
data_var variable.data_var, in this example just execute myop which depends on data_var.prefetch OP assigns the next batch from a Dataset object (CPU based) into temp_var a gpu-based variable.tf.control_dependencies(...) wait for myop and prefetch_op to complete, then assign temp_var to data_varThis doesn't appear to work. The TF profiler shows that myop is not asynchronously processed with the MEMCPYHtoD process as was hoped for.
I had expected that the two OPs, myop, and prefetch_op would run asynchronously because there are no dependencies between them.

Here is the code I used to run this test. It will run stand-alone.
import tensorflow as tf
from tensorflow.python.client import timeline
import numpy as np
import os
sz = 2000
x = np.random.rand(sz, sz)
def gen():
yield x
# Dataset
ds = tf.data.Dataset.from_generator(generator=gen, output_types=tf.float64)
ds = ds.repeat()
ds = ds.prefetch(2)
iterator = ds.make_one_shot_iterator()
next_element = iterator.get_next()
# Prefetch to GPU OPs - this is exepected to happen asynchronously
temp_var = tf.Variable(np.zeros((sz, sz)), name='temp_var', dtype=tf.float64, expected_shape=(sz, sz), trainable=False)
data_var = tf.Variable(np.zeros((sz, sz)), name='data_var', dtype=tf.float64, expected_shape=(sz, sz), trainable=False)
prefetch_op = tf.assign(temp_var, next_element)
# Trivial math operation for timing purposes
myop = tf.sqrt(data_var, name='myop')
# Final prefetch to GPU operation, copy data from temp_var to data_var
with tf.control_dependencies((myop, prefetch_op)):
assign_op = tf.assign(data_var, temp_var)
# Open session, initialize, and run 1 iteration to warm the prefetch buffer
sess = tf.Session()
sess.run(tf.global_variables_initializer())
sess.run((myop, assign_op))
# Main sess.run with profiling on
tf_options_profiler_on = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
tf_run_metadata = tf.RunMetadata()
graph_result = sess.run((myop, assign_op), options=tf_options_profiler_on, run_metadata=tf_run_metadata)
# Write profile data
chrome_trace = timeline.Timeline(tf_run_metadata.step_stats).generate_chrome_trace_format()
os.makedirs('/tmp/profile', exist_ok=True)
with open('/tmp/profile/tf_profile_step.json', 'w') as f:
f.write(chrome_trace)
print(graph_result)
print('Writing profiler output to /tmp/profile')
In tensorflow 1.7 the Dataset API now has prefetch_to_device.
Documentation:
https://www.tensorflow.org/versions/master/api_docs/python/tf/contrib/data/prefetch_to_device
Github discussion:
https://github.com/tensorflow/tensorflow/issues/13610#issuecomment-364331935
It looks like another option is mentioned further down in the above Github discussion (now closed) called a MultiDeviceIterator.
https://github.com/tensorflow/tensorflow/issues/13610#issuecomment-411893139
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With