Why would this dataset implementation run out of memory?

Question

I follow this instruction and write the following code to create a Dataset for images(COCO2014 training set)

from pathlib import Path
import tensorflow as tf


def image_dataset(filepath, image_size, batch_size, norm=True):
    def preprocess_image(image):
        image = tf.image.decode_jpeg(image, channels=3)
        image = tf.image.resize(image, image_size)
        if norm:
            image /= 255.0  # normalize to [0,1] range
        return image

    def load_and_preprocess_image(path):
        image = tf.read_file(path)
        return preprocess_image(image)

    all_image_paths = [str(f) for f in Path(filepath).glob('*')]
    path_ds = tf.data.Dataset.from_tensor_slices(all_image_paths)
    ds = path_ds.map(load_and_preprocess_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    ds = ds.shuffle(buffer_size = len(all_image_paths))
    ds = ds.repeat()
    ds = ds.batch(batch_size)
    ds = ds.prefetch(tf.data.experimental.AUTOTUNE)

    return ds

ds = image_dataset(train2014_dir, (256, 256), 4, False)
image = ds.make_one_shot_iterator().get_next('images')
# image is then fed to the network

This code will always run out of both memory(32G) and GPU(11G) and kill the process. Here is the messages shown on terminal. enter image description here

I also spot that the program get stuck at sess.run(opt_op). Where is wrong? How can I fix it?

Stewart_R · Accepted Answer

The problem is this:

ds = ds.shuffle(buffer_size = len(all_image_paths))

The buffer that Dataset.shuffle() uses is an 'in memory' buffer so you are effectively trying to load the whole dataset in memory.

You have a couple of options (which you can combine) to fix this:

Option 1:

Reduce the buffer size to a much smaller number.

Option 2:

Move the shuffle() statment before the map() statement.

This means we would be shuffling before we load the images therefore we'd just be storing the filenames in the memory buffer for the shuffle rather than storing huge tensors.

Why would this dataset implementation run out of memory?

Tags:

python

tensorflow

tensorflow-datasets

Maybe

1 Answers

Option 1:

Option 2:

Stewart_R

Recent Activity

Donate For Us

Why would this dataset implementation run out of memory?

Tags:

python

tensorflow

tensorflow-datasets

Maybe

1 Answers

Option 1:

Option 2:

Stewart_R

Related questions

Recent Activity

Donate For Us