Fit Image augmentations to training data using flow_from_directory

Question

I want to use Image augmentation in Keras. My current code looks like this:

# define image augmentations
train_datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=True,
zca_whitening=True)

# generate image batches from directory
train_datagen.flow_from_directory(train_dir)

When I run a model with this, I get the following error:

"ImageDataGenerator specifies `featurewise_std_normalization`, but it hasn't been fit on any training data."

But I didn't find clear information about how to use train_dataget.fit() together with flow_from_directory.

desertnaut · Accepted Answer

You are right, the docs are not very enlightening on this ...

What you need is actually a 4-step process:

Define your data augmentation
Fit the augmentation
Setup your generator using flow_from_directory()
Train your model with fit_generator()

Here is the necessary code for a hypothetical image classification case:

# define data augmentation configuration
train_datagen = ImageDataGenerator(featurewise_center=True,
                                   featurewise_std_normalization=True,
                                   zca_whitening=True)

# fit the data augmentation
train_datagen.fit(x_train)

# setup generator
train_generator = train_datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_height, img_width),
        batch_size=batch_size,
        class_mode='categorical')

# train model
model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples,
    epochs=epochs,
    validation_data=validation_generator, # optional - if used needs to be defined
    validation_steps=nb_validation_samples)

Clearly, there are several parameters to be defined (train_data_dir, nb_train_samples etc), but hopefully you get the idea.

If you need to also use a validation_generator, as in my example, this should be defined the same way as your train_generator.

UPDATE (after comment)

Step 2 needs some discussion; here, x_train are the actual data which, ideally, should fit into the main memory. Also (documentation), this step is

Only required if featurewise_center or featurewise_std_normalization or zca_whitening.

However, there are many real-world cases where the requirement that all the training data fit into memory is clearly unrealistic. How you center/normalize/white data in such cases is a (huge) sub-field in itself, and arguably the main reason for the existence of big data processing frameworks such as Spark.

So, what to do in practice here? Well, the next logical action in such a case is to sample your data; indeed, this is exactly what the community advises - here is Keras creator Francois Chollet on Working with large datasets like Imagenet:

datagen.fit(X_sample) # let's say X_sample is a small-ish but statistically representative sample of your data

And another quote from an ongoing open discussion about extending ImageDataGenerator (emphasis added):

fit is required for feature-wise standardization and ZCA , and it only takes an array as parameter, there is no fit for directory. For now, we need to manually read a subset of the image to do this fit for a directory. One idea is we can change fit() to accept the generator itself(flow_from_directory), of course, standardization should be disabled during fit.

Fit Image augmentations to training data using flow_from_directory

Tags:

python

machine-learning

deep-learning

keras

Mario Kreutzfeldt

1 Answers

desertnaut

Recent Activity

Donate For Us

Fit Image augmentations to training data using flow_from_directory

Tags:

python

machine-learning

deep-learning

keras

Mario Kreutzfeldt

1 Answers

desertnaut

Related questions

Recent Activity

Donate For Us