Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras Data Augmentation Parameters

I read some materials about data augmentation in Keras but it is still a bit vague for me. Is there any parameter to control the the number of images created from each input image in the data augmentation step? In this example, I can't see any parameter that controls the number of images created from each image.

For example, in the below code I can have a parameter (num_imgs) for controlling the number of images created from each input image and stored in a folder called preview; but in the real-time data augmentation there isn't any parameter for this purpose.

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
num_imgs = 20
datagen = ImageDataGenerator(
        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')

img = load_img('data/train/cats/cat.0.jpg')  # this is a PIL image
x = img_to_array(img)  # this is a Numpy array with shape (3, 150, 150)
x = x.reshape((1,) + x.shape)  # this is a Numpy array with shape (1, 3, 150, 150)

# the .flow() command below generates batches of randomly transformed images
# and saves the results to the `preview/` directory
i = 0
for batch in datagen.flow(x, batch_size=1,
                          save_to_dir='preview', save_prefix='cat', save_format='jpeg'):
    i += 1
    if i > num_imgs:
        break  # otherwise the generator would loop indefinitely
like image 972
SaraG Avatar asked Dec 15 '16 22:12

SaraG


1 Answers

Data augmentation works as follows: at each learning epoch transformations with randomly selected parameters within the specified range are applied to all original images in the training set. After an epoch is completed, i.e. after having exposed a learning algorithm to the entire set of training data, the next learning epoch is started and training data is once again augmented by applying specified transformations to the original training data.

In that way the number of times each image is augmented is equal to the number of learning epochs. Recall form the example that you linked:

# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(X_train, Y_train,
                    batch_size=batch_size),
                    samples_per_epoch=X_train.shape[0],
                    nb_epoch=nb_epoch,
                    validation_data=(X_test, Y_test))

Here datagen object will expose the training set to the model nb_epoch times, so each image would be augmented nb_epoch times. In this way the learning algorithm almost never sees two exactly the same training examples, because at each epoch training examples are randomly transformed.

like image 59
Sergii Gryshkevych Avatar answered Oct 19 '22 14:10

Sergii Gryshkevych