Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can flow_from_directory get train and validation data from the same directory in Keras?

I got the following example from here.

train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        'data/validation',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

There are two separate directories for train and validation. Just curious whether I can get train and validation data split from the same directory instead of two separate directories? Any example?

like image 695
BAE Avatar asked Oct 29 '18 01:10

BAE


People also ask

What does flow_from_directory return?

the result is a batch_size number of images and their associated labels. Images will have the shape(batch_size, IMG_SHAPE, IMG_SHAPE, channels) and labels are shape (batch_size,1).

How does flow_from_directory work?

flow_from_directory Method This method will identify classes automatically from the folder name. For this method, arguments to be used are: directory value : The path to parent directory containing sub-directories(class/label) with images. classes value : Name of the class/classes for which images should be loaded.

What is Batch_size in flow_from_directory?

The syntax to call flow_from_directory() function is as follows: flow_from_directory(directory, target_size=(256, 256), color_mode='rgb', classes=None, class_mode='categorical', batch_size=32, shuffle=True, seed=None, save_to_dir=None, save_prefix='', save_format='png', follow_links=False, subset=None, interpolation=' ...

What is validation split in ImageDataGenerator?

Lately, however (here's the pull request, if you're interested), a new validation_split parameter was added to the ImageDataGenerator that allows you to randomly split a subset of your training data into a validation set, by specifying the percentage you want to allocate to the validation set: datagen = ...


3 Answers

You can pass validation_split argument (a number between 0 and 1) to ImageDataGenerator class instance to split the data into train and validation sets:

generator = ImagaDataGenerator(..., validation_split=0.3)

And then pass subset argument to flow_from_directory to specify training and validation generators:

train_gen = generator.flow_from_directory(dir_path, ..., subset='training')
val_gen = generator.flow_from_directory(dir_path, ..., subset='validation')

Note: If you have set augmentation parameters for the ImageDataGenerator, then by using this solution both training and validation images will be augmented.

like image 75
today Avatar answered Sep 19 '22 13:09

today


The above solution requires you to apply the same augmentations to the training and validation set, which might not be desired (You might not want to apply shear,rotation and zoom etc to the validation data). Separate training and validation augmentations from the same folder is not yet available.

See https://github.com/keras-team/keras/issues/5862 for full discussion (and some possible ways to handle this). People have usually resorted to scripts that create a new folder for validation, but that won't be an exact answer to this question.

like image 36
dapperdan Avatar answered Sep 21 '22 13:09

dapperdan


As @dapperdan mentioned, the current marked solution by @today means that both training and validation sets go through the same transformations; which is fine if you are not planning to do data augmentation. If you want to do data augmentation then one would want to transform the training data and leave the validation data 'unaugmented'.

To do that, you should create two ImageDataGenerators with the required transformations from for the appropriate data; and then select subsets using 'flow_from_directory' with same seed.

# Validation ImageDataGenerator with rescaling.
valid_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)
# Training ImagaDataGenerator with Augmentation transf.  
train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2,\
                                   rotation_range=15, shear_range=10,\
                                   zoom_range=0.1, fill_mode='nearest', \
                                   height_shift_range=0.05, width_shift_range=0.1)

# Create a flow from the directory for validation data - seed=42
# Choose subset = 'validation'
valid_gen = valid_datagen.flow_from_directory(dir_path, subset='validation',\
                                              shuffle=True, seed=42, 
                                              target_size=img_shape,\
                                              batch_size=64)
# Create a flow from the directory using same seed and 'training' subset.
train_gen = train_datagen.flow_from_directory(dir_path, subset='training',\
                          shuffle=True, seed=42, target_size=img_shape,\
                          batch_size=64)
like image 35
Hassan el-Hajj Avatar answered Sep 19 '22 13:09

Hassan el-Hajj