How to cache data during the first epoch correctly (Tensorflow, dataset)?

Tags:

tensorflow-datasets

I'm trying to used the cache transformation for a dataset. Here is my current code (simplified):

dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads=1)
dataset = dataset.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=5000, count=1))
dataset = dataset.map(_parser_a, num_parallel_calls=12)
dataset = dataset.padded_batch(
    20, 
    padded_shapes=padded_shapes,
    padding_values=padding_values
)
dataset = dataset.prefetch(buffer_size=1)
dataset = dataset.cache()

After the first epoch, I received the following error message:

The calling iterator did not fully read the dataset we were attempting to cache. In order to avoid unexpected truncation of the sequence, the current [partially cached] sequence will be dropped. This can occur if you have a sequence similar to dataset.cache().take(k).repeat(). Instead, swap the order (i.e. dataset.take(k).cache().repeat())

Then, the code proceeded and still read data from the hard drive instead of the cache. So, where should I place dataset.cache() to avoid the error? Thanks.

934

asked May 24 '18 23:05

1 Answers

The implementation of the Dataset.cache() transformation is fairly simple: it builds up a list of the elements that pass through it as you iterate over completely it the first time, and it returns elements from that list on subsequent attempts to iterate over it. If the first pass only performs a partial pass over the data then the list is incomplete, and TensorFlow doesn't try to use the cached data, because it doesn't know whether the remaining elements will be needed, and in general it might need to reprocess all the preceding elements to compute the remaining elements.

By modifying your program to consume the entire dataset, and iterate over it until tf.errors.OutOfRangeError is raised, the cache will have a complete list of the elements in the dataset, and it will be used on all subsequent iterations.

190

answered Oct 06 '22 11:10

mrry

Related questions
                            
                                Rank mismatch: Rank of labels (received 2) should equal rank of logits minus 1 (received 2)
                            
                                Initial bias values for a neural network
                            
                                Restoring a Tensorflow model that uses Iterators
                            
                                Where to apply batch normalization on standard CNNs
                            
                                how to use to_categorical when using ImageDataGenerator
                            
                                Keras, Tensorflow: How to set breakpoint (debug) in custom layer when evaluating?
                            
                                keras model subclassing examples
                            
                                ModuleNotFoundError: No module named 'tensorflow_docs' when creating TensorFlow docs
                            
                                Keras Sequential without providing input shape
                            
                                AttributeError: module 'tensorflow_core._api.v2.config' has no attribute 'list_physical_devices'
                            
                                (Keras) ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float)
                            
                                InvalidArgumentError : input depth must be evenly divisible by filter depth: 4 vs 3
                            
                                Loss clipping in tensor flow (on DeepMind's DQN)
                            
                                Tensorflow documentation's example code on "Logging Device Placement" doesn't print out anything
                            
                                Deploying Keras Models via Google Cloud ML
                            
                                What are the advantages of using tf.train.SequenceExample over tf.train.Example for variable length features?
                            
                                How to import the tensorflow lite interpreter in Python?
                            
                                Using a GPU both as video card and GPGPU
                            
                                TensorFlow: slow performance when getting gradients at inputs
                            
                                tf.data.Dataset.padded_batch pad differently each feature

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to cache data during the first epoch correctly (Tensorflow, dataset)?

Tags:

tensorflow

tensorflow-datasets

Maosi Chen

People also ask

1 Answers

mrry

Recent Activity

Donate For Us