Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras: load images batch wise for large dataset

Tags:

keras

Its is possible in keras to load only one batch in memory at a time as I have 40GB dataset of images.

If dataset is small I can used ImageDataGenerator to generator batches but due large dataset I can't load all images in memory.

Is there any method in keras to do something similar to following tensorflow code:

path_queue = tf.train.string_input_producer(input_paths, shuffle= False)
paths, contents = reader.read(path_queue)
inputs = decode(contents)
input_batch = tf.train.batch([inputs], batch_size=2)

I am using this method to serialize inputs in tensorflow but I don't know how to achieve this task in Keras.

like image 866
Mohbat Tharani Avatar asked Nov 09 '17 11:11

Mohbat Tharani


People also ask

What method is used to fit a model on batches from an ImageDataGenerator?

You can do this by calling the fit() function on the data generator and passing it to your training dataset. The data generator itself is, in fact, an iterator, returning batches of image samples when requested.

What is batch size in ImageDataGenerator?

For example, if you have 1000 images in your dataset and the batch size is defined as 10. Then the "ImageDataGenerator" will produce 10 images in each iteration of the training. An iteration is defined as steps per epoch i.e. the total number of samples / batch_size.


1 Answers

Keras has the method fit_generator() in its models. It accepts a python generator or a keras Sequence as input.

You can create a simple generator like this:

fileList = listOfFiles     

def imageLoader(files, batch_size):

    L = len(files)

    #this line is just to make the generator infinite, keras needs that    
    while True:

        batch_start = 0
        batch_end = batch_size

        while batch_start < L:
            limit = min(batch_end, L)
            X = someMethodToLoadImages(files[batch_start:limit])
            Y = someMethodToLoadTargets(files[batch_start:limit])

            yield (X,Y) #a tuple with two numpy arrays with batch_size samples     

            batch_start += batch_size   
            batch_end += batch_size

And fit like this:

model.fit_generator(imageLoader(fileList,batch_size),steps_per_epoch=..., epochs=..., ...)

Normally, you pass to steps_per_epoch the number of batches you will take from the generator.

You can also implement your own Keras Sequence. It's a little more work, but they recommend using this if you're going to make multi-thread processing.

like image 118
Daniel Möller Avatar answered Sep 20 '22 15:09

Daniel Möller