I set the featurewise_center = True
and then use flow_from_directory
to set up my training and validation data in keras. However, i got the error
UserWarning: This ImageDataGenerator specifies `featurewise_center`,
but it hasn't been fit on any training data. Fit it first by calling `.fit(n
numpy_data)`
Is there any means I can use flow_from_directory
and then to fit the data as required ?
You can fill this in different ways like a constant value or nearest pixel values, etc. This is specified in the fill_mode argument and the default value is “nearest” which simply replaces the empty area with the nearest pixel values. # ImageDataGenerator rotation.
For example, if you have 1000 images in your dataset and the batch size is defined as 10. Then the "ImageDataGenerator" will produce 10 images in each iteration of the training. An iteration is defined as steps per epoch i.e. the total number of samples / batch_size.
flow_from_directory Method This method is useful when the images are sorted and placed in there respective class/label folders. This method will identify classes automatically from the folder name.
This method uses the zoom_range argument of the ImageDataGenerator class. We can specify the percentage value of the zooms either in a float, range in the form of an array, or python tuple. If we specify the value of the zoom-in using float value then it will be [1-floatValue, 1+floatValue].
featurewise_center transforms the images to 0 mean. This is done by using the formulae But for the ImageDataGenerator to do this transformation it needs to know the mean of the dataset and fit method on the ImageDataGenerator does exactly this operation of calculating these statistics.
"ImageDataGenerator specifies `featurewise_std_normalization`, but it hasn't been fit on any training data." But I didn't find clear information about how to use train_dataget.fit () together with flow_from_directory.
The ImageDataGenerator class refers to centering that uses the mean calculated on the training dataset as feature-wise centering. It requires that the statistic is calculated on the training dataset prior to scaling.
Sometimes, the datasets we download contains folders of data corresponding to the respective classes. To use the flow method, one may first need to append the data and corresponding labels into an array and then use the flow method on those arrays. Thus overall it is a tedious task.
featurewise_center
transforms the images to 0 mean. This is done by using the formulae
X = X - mean(X)
But for the ImageDataGenerator
to do this transformation it needs to know the mean of the dataset and fit
method on the ImageDataGenerator
does exactly this operation of calculating these statistics.
As the keras docs explain
Fits the data generator to some sample data. This computes the internal data stats related to the data-dependent transformations, based on an array of sample data.
If the dataset can be fully loaded into the memory, we can do so by loading all the images into a numpy array and running the fit
on it.
Sample code (RGB images of 256x256) :
from keras.layers import Input, Dense, Flatten, Conv2D
from keras.models import Sequential
from keras.preprocessing.image import ImageDataGenerator
import numpy as np
from pathlib import Path
from PIL import Image
height = width = 256
def read_pil_image(img_path, height, width):
with open(img_path, 'rb') as f:
return np.array(Image.open(f).convert('RGB').resize((width, height)))
def load_all_images(dataset_path, height, width, img_ext='png'):
return np.array([read_pil_image(str(p), height, width) for p in
Path(dataset_path).rglob("*."+img_ext)])
train_datagen = ImageDataGenerator(featurewise_center=True)
train_datagen.fit(load_all_images('./images/', height, width))
train_generator = train_datagen.flow_from_directory(
'./images/',
target_size=(height, width),
batch_size=32,
class_mode='binary',
color_mode='rgb')
model = Sequential()
model.add(Conv2D(1,(3,3), input_shape=(height,width,3)))
model.add(Flatten())
model.add(Dense(1))
model.compile('adam', 'binary_crossentropy')
model.fit_generator(train_generator)
But what if the data cannot be fully loaded into memory ? One approach is to sample the images randomly from the dataset.
Normally we use mean
of training data only to do mean normalization and use the same mean for normalization validation/test data. It will be bit tricky to do the same via the datagenerator
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With