Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would this dataset implementation run out of memory?

I follow this instruction and write the following code to create a Dataset for images(COCO2014 training set)

from pathlib import Path
import tensorflow as tf


def image_dataset(filepath, image_size, batch_size, norm=True):
    def preprocess_image(image):
        image = tf.image.decode_jpeg(image, channels=3)
        image = tf.image.resize(image, image_size)
        if norm:
            image /= 255.0  # normalize to [0,1] range
        return image

    def load_and_preprocess_image(path):
        image = tf.read_file(path)
        return preprocess_image(image)

    all_image_paths = [str(f) for f in Path(filepath).glob('*')]
    path_ds = tf.data.Dataset.from_tensor_slices(all_image_paths)
    ds = path_ds.map(load_and_preprocess_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    ds = ds.shuffle(buffer_size = len(all_image_paths))
    ds = ds.repeat()
    ds = ds.batch(batch_size)
    ds = ds.prefetch(tf.data.experimental.AUTOTUNE)

    return ds

ds = image_dataset(train2014_dir, (256, 256), 4, False)
image = ds.make_one_shot_iterator().get_next('images')
# image is then fed to the network

This code will always run out of both memory(32G) and GPU(11G) and kill the process. Here is the messages shown on terminal. enter image description here

I also spot that the program get stuck at sess.run(opt_op). Where is wrong? How can I fix it?

like image 214
Maybe Avatar asked Jul 05 '19 03:07

Maybe


1 Answers

The problem is this:

ds = ds.shuffle(buffer_size = len(all_image_paths))

The buffer that Dataset.shuffle() uses is an 'in memory' buffer so you are effectively trying to load the whole dataset in memory.

You have a couple of options (which you can combine) to fix this:

Option 1:

Reduce the buffer size to a much smaller number.

Option 2:

Move the shuffle() statment before the map() statement.

This means we would be shuffling before we load the images therefore we'd just be storing the filenames in the memory buffer for the shuffle rather than storing huge tensors.

like image 185
Stewart_R Avatar answered Oct 19 '22 10:10

Stewart_R