Can flow_from_directory get train and validation data from the same directory in Keras?

Tags:

I got the following example from here.

train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        'data/validation',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

There are two separate directories for train and validation. Just curious whether I can get train and validation data split from the same directory instead of two separate directories? Any example?

695

asked Oct 29 '18 01:10

3 Answers

You can pass validation_split argument (a number between 0 and 1) to ImageDataGenerator class instance to split the data into train and validation sets:

generator = ImagaDataGenerator(..., validation_split=0.3)

And then pass subset argument to flow_from_directory to specify training and validation generators:

train_gen = generator.flow_from_directory(dir_path, ..., subset='training')
val_gen = generator.flow_from_directory(dir_path, ..., subset='validation')

Note: If you have set augmentation parameters for the ImageDataGenerator, then by using this solution both training and validation images will be augmented.

answered Sep 19 '22 13:09

The above solution requires you to apply the same augmentations to the training and validation set, which might not be desired (You might not want to apply shear,rotation and zoom etc to the validation data). Separate training and validation augmentations from the same folder is not yet available.

See https://github.com/keras-team/keras/issues/5862 for full discussion (and some possible ways to handle this). People have usually resorted to scripts that create a new folder for validation, but that won't be an exact answer to this question.

answered Sep 21 '22 13:09

dapperdan

As @dapperdan mentioned, the current marked solution by @today means that both training and validation sets go through the same transformations; which is fine if you are not planning to do data augmentation. If you want to do data augmentation then one would want to transform the training data and leave the validation data 'unaugmented'.

To do that, you should create two ImageDataGenerators with the required transformations from for the appropriate data; and then select subsets using 'flow_from_directory' with same seed.

# Validation ImageDataGenerator with rescaling.
valid_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)
# Training ImagaDataGenerator with Augmentation transf.  
train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2,\
                                   rotation_range=15, shear_range=10,\
                                   zoom_range=0.1, fill_mode='nearest', \
                                   height_shift_range=0.05, width_shift_range=0.1)

# Create a flow from the directory for validation data - seed=42
# Choose subset = 'validation'
valid_gen = valid_datagen.flow_from_directory(dir_path, subset='validation',\
                                              shuffle=True, seed=42, 
                                              target_size=img_shape,\
                                              batch_size=64)
# Create a flow from the directory using same seed and 'training' subset.
train_gen = train_datagen.flow_from_directory(dir_path, subset='training',\
                          shuffle=True, seed=42, target_size=img_shape,\
                          batch_size=64)

answered Sep 19 '22 13:09

Hassan el-Hajj

Related questions
                            
                                IllegalArgumentException thrown when count and collect function in spark
                            
                                Plot datetime.timedelta using matplotlib and python
                            
                                Efficient numpy argsort with condition while maintaining original indices
                            
                                multiplying lists of lists with different lengths
                            
                                Perform operation on all "key":"value" pair in dict and store the result in a new dict object
                            
                                Get model name from instance
                            
                                TclError: no display name and no $DISPLAY environment variable in Google Colab
                            
                                What does the 'tearoff' attribute do in a tkinter Menu?
                            
                                Test if any column of a pandas DataFrame satisfies a condition
                            
                                row sum on a pandas pivot table
                            
                                Create a circular barplot in python
                            
                                Pandas: reading Excel file starting from the row below that with a specific value
                            
                                No module named graphframes Jupyter Notebook
                            
                                Check if dataframe has a zero element
                            
                                Fatal Python error: Py_Initialize: can't initialize sys standard streams LookupError: unknown encoding: 65001
                            
                                self.model() in django custom UserManager
                            
                                Fill the diagonal of Pandas DataFrame with elements from Pandas Series
                            
                                np.where() do nothing if condition fails
                            
                                Why does sigmoid & crossentropy of Keras/tensorflow have low precision?
                            
                                How to use CUDA stream in Pytorch?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can flow_from_directory get train and validation data from the same directory in Keras?

Tags:

python

machine-learning

keras

training-data

BAE

People also ask

3 Answers

today

dapperdan

Hassan el-Hajj

Recent Activity

Donate For Us