Tensorflow Data API - prefetch

Tags:

I am trying to use new features of TF, namely Data API, and I am not sure how prefetch works. In the code below

def dataset_input_fn(...)
    dataset = tf.data.TFRecordDataset(filenames, compression_type="ZLIB")
    dataset = dataset.map(lambda x:parser(...))
    dataset = dataset.map(lambda x,y: image_augmentation(...)
                      , num_parallel_calls=num_threads
                     )

    dataset = dataset.shuffle(buffer_size)
    dataset = dataset.batch(batch_size)    
    dataset = dataset.repeat(num_epochs)
    iterator = dataset.make_one_shot_iterator()

does it matter between each lines above I put dataset=dataset.prefetch(batch_size)? Or maybe it should be after every operation that would be using output_buffer_size if the dataset was coming from tf.contrib.data?

722

asked Nov 01 '17 22:11

MPękalski

1 Answers

In discussion on github I found a comment by mrry:

Note that in TF 1.4 there will be a Dataset.prefetch() method that makes it easier to add prefetching at any point in the pipeline, not just after a map(). (You can try it by downloading the current nightly build.)

and

For example, Dataset.prefetch() will start a background thread to populate a ordered buffer that acts like a tf.FIFOQueue, so that downstream pipeline stages need not block. However, the prefetch() implementation is much simpler, because it doesn't need to support as many different concurrent operations as a tf.FIFOQueue.

so it means prefetch could be put by any command and it works on the previous command. So far I have noticed the biggest performance gains by putting it only at the very end.

There is one more discussion on Meaning of buffer_size in Dataset.map , Dataset.prefetch and Dataset.shuffle where mrry explains a bit more about the prefetch and buffer.

UPDATE 2018/10/01:

From version 1.7.0 Dataset API (in contrib) has an option to prefetch_to_device. Note that this transformation has to be the last in the pipeline and when TF 2.0 arrives contrib will be gone. To have prefetch work on multiple GPUs please use MultiDeviceIterator (example see #13610) multi_device_iterator_ops.py.

https://www.tensorflow.org/versions/master/api_docs/python/tf/contrib/data/prefetch_to_device

answered Oct 07 '22 03:10

MPękalski

Related questions
                            
                                Load saved checkpoint and predict not producing same results as in training
                            
                                How do I profile a tf.data.Dataset?
                            
                                tensorflow lite model gives very different accuracy value compared to python model
                            
                                How to inform class weights when using `tensorflow.python.keras.estimator.model_to_estimator` to convert Keras Models to Estimator API?
                            
                                Multiple sessions and graphs in Tensorflow (in the same process)
                            
                                Training a tf.keras model with a basic low-level TensorFlow training loop doesn't work
                            
                                feed data into a tf.contrib.data.Dataset like a queue
                            
                                Extract target from Tensorflow PrefetchDataset
                            
                                Running a Tensorflow model on Android
                            
                                Loading folders of images in tensorflow
                            
                                tf.image.resize_bilinear vs cv2.resize
                            
                                How to read data into TensorFlow batches from example queue?
                            
                                How to update model parameters with accumulated gradients?
                            
                                Low GPU usage by Keras / Tensorflow?
                            
                                Sentiment Analysis using tensorflow
                            
                                TensorFlow version 1.0.0-rc2 on Windows: "OpKernel ('op: "BestSplits" device_type: "CPU"') for unknown op: BestSplits" with test code
                            
                                How to export Estimator model with export_savedmodel function
                            
                                Evaluate all pair combinations of rows of two tensors in tensorflow
                            
                                How to load an image and show the image using keras?
                            
                                How do I create padded batches in Tensorflow for tf.train.SequenceExample data using the DataSet API?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Tensorflow Data API - prefetch

Tags:

tensorflow

tensorflow-datasets

prefetch

MPękalski

People also ask

1 Answers

MPękalski

Recent Activity

Donate For Us