Memory management in Tensorflow's Dataset API

Tags:

I have a training dataset that is too big to fit into memory, so my code reads only 1,000 records from disk at a time. Now I would like to use Tensorflow's new Dataset API. Does the Dataset API allow me to specify the number of records to keep in memory or does Tensorflow automatically manage memory so that I don't have to?

233

asked Jul 16 '17 03:07

user554481

2 Answers

Yes. An example from official guide (Using the Dataset API for TensorFlow Input Pipelines, https://www.tensorflow.org/programmers_guide/datasets)

filenames = ["/var/data/file1.tfrecord", "/var/data/file2.tfrecord"]
dataset = tf.contrib.data.TFRecordDataset(filenames)
dataset = dataset.map(...) ## Parsing data with a user specified function
dataset = dataset.shuffle(buffer_size=10000) ## 10000: size of sample/record pool for random selection
dataset = dataset.repeat() ## None: keep repeating
dataset = dataset.batch(32) ## 32: number of samples/records per batch (to be read into memory)

130

answered Oct 04 '22 15:10

Maosi Chen

If you will specify the number of records via batch_size. In this case TF will grab only batch_size elements from the file. You can also specify shuffle and this will guarantee that all the time in the memory will be at maximum buffer_size elements.

I verified it on my tfrecords files. I have 100 tfrecords files, each of them is ~10Gb (which is more than the memory on my laptop). And everything works fine.

answered Oct 04 '22 16:10

Salvador Dali

Related questions
                            
                                Changing activation function of a keras layer w/o replacing whole layer
                            
                                0% accuracy with evaluate_generator but 75% accuracy during training with same data - what is going on?
                            
                                Find Unique values in a 2D Tensor using Tensorflow
                            
                                Issue of batch sizes when using custom loss functions in Keras
                            
                                Getting wrong prediction after loading a saved model
                            
                                Different results while training with CudnnLSTM compared to regular LSTMCell in Tensorflow
                            
                                What is the utility of `Tensor` (as opposed to `EagerTensor`) in Tensorflow 2.0?
                            
                                Saving and loading multiple models with the same graph in TensorFlow Functional API
                            
                                Tensorboard: AttributeError: 'Model' object has no attribute '_get_distribution_strategy'
                            
                                'tensorflow' has no attribute 'config'
                            
                                Does changing a token name in an image caption model affect performance?
                            
                                Correct pb file to move Tensorflow model into ML.NET
                            
                                Change initializer of Variable in Tensorflow
                            
                                How to predict a simple sequence using seq2seq from tensorflow?
                            
                                Keep TensorFlow Model Encrypted on Android
                            
                                Keras multi-class prediction output is limited to one class
                            
                                What is output tensor of Max Pooling 2D Layer in TensorFlow?
                            
                                Recurrent Neural Network (RNN) - Forget Layer, and TensorFlow
                            
                                Accessing gradient values of keras model outputs with respect to inputs
                            
                                How to get weights format from TensorFlow .pb model?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Memory management in Tensorflow's Dataset API

Tags:

tensorflow

tensorflow-datasets

user554481

People also ask

2 Answers

Maosi Chen

Salvador Dali

Recent Activity

Donate For Us