Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is data augmentation in Keras applied to the validation set when using ImageDataGenerator and flow_from_directory

I am training a deep neural network using ImageDataGenerator and flow_from_directory in Keras. Data is in one folder. Therefore, I am using validation_split=0.x when creating a generator using ImageDataGenerator. I then create two flows, one for training and one for validation using flow_from_directory with subset="training" and subset="validation" respectively.

I am wondering if any specified image augmentation (transformation) when creating the ImageDataGenerator is applied to both training and validation subsets or only the training one.

I can't find the right section in the Keras repository in GitHub to check it.

(Note: I know it's a better practice to have two separate directories for training and validation with two separate generators)

Code Example:

img_gen = ImageDataGenerator(validation_split=0.2,horizontal_flip = True, vertical_flip = True,...)
train_flow = img_gen.flow_from_directory('directory',subset = "training",...)
validation_flow = img_gen.flow_from_directory('directory',subset = "validation",...)
history=model.fit_generator(generator = train_flow ,validation_data = validation_flow,...)
like image 690
sidrat28 Avatar asked Jun 07 '18 09:06

sidrat28


1 Answers

Using ImageDataGenerator and flow_from_directory for both training and validation sets, will also augment the validation data. This is shown in the Keras documentation, which states under Image Generator Methods flow_from_directory: Takes data & label arrays, generates batches of augmented data. If you do not want to use data augmentation on the validation set, you can look at the provided example:

train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

validation_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

validation_generator = validation_datagen.flow_from_directory(
        'data/validation',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

model.fit_generator(
        train_generator,
        steps_per_epoch=2000,
        epochs=50,
        validation_data=validation_generator,
        validation_steps=800)

Note: In this case, you could also pass the rescaled validation data directly, without using a generator, e.g.: validation_data=(x_valid, y_valid)

like image 136
mgross Avatar answered Oct 10 '22 08:10

mgross