I got the following example from here.
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'data/train',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
'data/validation',
target_size=(150, 150),
batch_size=32,
class_mode='binary')
There are two separate directories for train and validation. Just curious whether I can get train and validation data split from the same directory instead of two separate directories? Any example?
the result is a batch_size number of images and their associated labels. Images will have the shape(batch_size, IMG_SHAPE, IMG_SHAPE, channels) and labels are shape (batch_size,1).
flow_from_directory Method This method will identify classes automatically from the folder name. For this method, arguments to be used are: directory value : The path to parent directory containing sub-directories(class/label) with images. classes value : Name of the class/classes for which images should be loaded.
The syntax to call flow_from_directory() function is as follows: flow_from_directory(directory, target_size=(256, 256), color_mode='rgb', classes=None, class_mode='categorical', batch_size=32, shuffle=True, seed=None, save_to_dir=None, save_prefix='', save_format='png', follow_links=False, subset=None, interpolation=' ...
Lately, however (here's the pull request, if you're interested), a new validation_split parameter was added to the ImageDataGenerator that allows you to randomly split a subset of your training data into a validation set, by specifying the percentage you want to allocate to the validation set: datagen = ...
You can pass validation_split
argument (a number between 0 and 1) to ImageDataGenerator
class instance to split the data into train and validation sets:
generator = ImagaDataGenerator(..., validation_split=0.3)
And then pass subset
argument to flow_from_directory
to specify training and validation generators:
train_gen = generator.flow_from_directory(dir_path, ..., subset='training')
val_gen = generator.flow_from_directory(dir_path, ..., subset='validation')
Note: If you have set augmentation parameters for the ImageDataGenerator
, then by using this solution both training and validation images will be augmented.
The above solution requires you to apply the same augmentations to the training and validation set, which might not be desired (You might not want to apply shear,rotation and zoom etc to the validation data). Separate training and validation augmentations from the same folder is not yet available.
See https://github.com/keras-team/keras/issues/5862 for full discussion (and some possible ways to handle this). People have usually resorted to scripts that create a new folder for validation, but that won't be an exact answer to this question.
As @dapperdan mentioned, the current marked solution by @today means that both training and validation sets go through the same transformations; which is fine if you are not planning to do data augmentation. If you want to do data augmentation then one would want to transform the training data and leave the validation data 'unaugmented'.
To do that, you should create two ImageDataGenerators with the required transformations from for the appropriate data; and then select subsets using 'flow_from_directory' with same seed.
# Validation ImageDataGenerator with rescaling.
valid_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)
# Training ImagaDataGenerator with Augmentation transf.
train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2,\
rotation_range=15, shear_range=10,\
zoom_range=0.1, fill_mode='nearest', \
height_shift_range=0.05, width_shift_range=0.1)
# Create a flow from the directory for validation data - seed=42
# Choose subset = 'validation'
valid_gen = valid_datagen.flow_from_directory(dir_path, subset='validation',\
shuffle=True, seed=42,
target_size=img_shape,\
batch_size=64)
# Create a flow from the directory using same seed and 'training' subset.
train_gen = train_datagen.flow_from_directory(dir_path, subset='training',\
shuffle=True, seed=42, target_size=img_shape,\
batch_size=64)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With