I am trying to train an autoencoder using TensorFlow and Keras. My training data has more than 200K 512x128 unlabeled images. If I want to load the data in a matrix, its shape will be (200000, 512, 128, 3). That is a few hundred GB of RAM space. I know I can reduce the batch size while training but that is for limiting memory usage in GPU/CPU.
Is there a workaround to this problem?
You can use the tf.data API for lazily loading the images... Below tutorial goes into the details..
Also look into tf.data.Dataset.prefetch, tf.data.Dataset.batch and tf.data.Dataset.cache methods to optimize performance..
You can also preprocess the data into TFRecords for reading them more efficiently before reading them in your training pipeline...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With