I follow this instruction and write the following code to create a Dataset for images(COCO2014 training set)
from pathlib import Path
import tensorflow as tf
def image_dataset(filepath, image_size, batch_size, norm=True):
def preprocess_image(image):
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, image_size)
if norm:
image /= 255.0 # normalize to [0,1] range
return image
def load_and_preprocess_image(path):
image = tf.read_file(path)
return preprocess_image(image)
all_image_paths = [str(f) for f in Path(filepath).glob('*')]
path_ds = tf.data.Dataset.from_tensor_slices(all_image_paths)
ds = path_ds.map(load_and_preprocess_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds = ds.shuffle(buffer_size = len(all_image_paths))
ds = ds.repeat()
ds = ds.batch(batch_size)
ds = ds.prefetch(tf.data.experimental.AUTOTUNE)
return ds
ds = image_dataset(train2014_dir, (256, 256), 4, False)
image = ds.make_one_shot_iterator().get_next('images')
# image is then fed to the network
This code will always run out of both memory(32G) and GPU(11G) and kill the process. Here is the messages shown on terminal.
I also spot that the program get stuck at sess.run(opt_op)
. Where is wrong? How can I fix it?
The problem is this:
ds = ds.shuffle(buffer_size = len(all_image_paths))
The buffer that Dataset.shuffle()
uses is an 'in memory' buffer so you are effectively trying to load the whole dataset in memory.
You have a couple of options (which you can combine) to fix this:
Reduce the buffer size to a much smaller number.
Move the shuffle()
statment before the map()
statement.
This means we would be shuffling before we load the images therefore we'd just be storing the filenames in the memory buffer for the shuffle rather than storing huge tensors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With