How does data normalization work in keras during prediction?

Tags:

I see that the imageDataGenerator allows me to specify different styles of data normalization, e.g. featurewise_center, samplewise_center, etc.

I see from the examples that if I specify one of these options, then I need to call the fit method on the generator in order to allow the generator to compute statistics like the mean image on the generator.

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True)

# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
datagen.fit(X_train)

# fits the model on batches with real-time data augmentation:
model.fit_generator(datagen.flow(X_train, Y_train, batch_size=32),
                samples_per_epoch=len(X_train), nb_epoch=nb_epoch)

My question is, how does prediction work if I have specified data normalization during training? I can't see how in the framework I would even pass knowledge of the training set mean/std deviation along to predict to allow me to normalize my test data myself, but I also don't see in the training code where this information is stored.

Are the image statistics needed for normalization stored in the model so that they can be used during prediction?

269

asked Jan 25 '17 15:01

Alex Taylor

4 Answers

Yes - this is a really huge downside of Keras.ImageDataGenerator that you couldn't provide the standarization statistics on your own. But - there is an easy method on how to overcome this issue.

Assuming that you have a function normalize(x) which is normalizing an image batch (remember that generator is not providing a simple image but an array of images - a batch with shape (nr_of_examples_in_batch, image_dims ..) you could make your own generator with normalization by using:

def gen_with_norm(gen, normalize):     for x, y in gen:         yield normalize(x), y

Then you might simply use gen_with_norm(datagen.flow, normalize) instead of datagen.flow.

Moreover - you might recover the mean and std computed by a fit method by getting it from appropriate fields in datagen (e.g. datagen.mean and datagen.std).

answered Oct 12 '22 09:10

Marcin Możejko

Use the standardize method of the generator for each element. Here is a complete example for CIFAR 10:

#!/usr/bin/env python  import keras from keras.datasets import cifar10 from keras.preprocessing.image import ImageDataGenerator from keras.models import Sequential from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv2D, MaxPooling2D  # input image dimensions img_rows, img_cols, img_channels = 32, 32, 3 num_classes = 10  batch_size = 32 epochs = 1  # The data, shuffled and split between train and test sets: (x_train, y_train), (x_test, y_test) = cifar10.load_data() print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples')  # Convert class vectors to binary class matrices. y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes)  model = Sequential()  model.add(Conv2D(32, (3, 3), padding='same', activation='relu',                  input_shape=x_train.shape[1:])) model.add(Conv2D(32, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25))  model.add(Conv2D(64, (3, 3), padding='same', activation='relu')) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25))  model.add(Flatten()) model.add(Dense(512, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(num_classes, activation='softmax'))  model.compile(loss='categorical_crossentropy', optimizer='rmsprop',               metrics=['accuracy'])  x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255  datagen = ImageDataGenerator(zca_whitening=True)  # Compute principal components required for ZCA datagen.fit(x_train)  # Apply normalization (ZCA and others) print(x_test.shape) for i in range(len(x_test)):     # this is what you are looking for     x_test[i] = datagen.standardize(x_test[i]) print(x_test.shape)  # Fit the model on the batches generated by datagen.flow(). model.fit_generator(datagen.flow(x_train, y_train,                                  batch_size=batch_size),                     steps_per_epoch=x_train.shape[0] // batch_size,                     epochs=epochs,                     validation_data=(x_test, y_test))

answered Oct 12 '22 09:10

Martin Thoma

I also had the same issue and I solved it using the same functionality, that the ImageDataGenerator used:

# Load Cifar-10 dataset
(trainX, trainY), (testX, testY) = cifar10.load_data()
generator = ImageDataGenerator(featurewise_center=True, 
                               featurewise_std_normalization=True)

# Calculate statistics on train dataset
generator.fit(trainX)
# Apply featurewise_center to test-data with statistics from train data
testX -= generator.mean
# Apply featurewise_std_normalization to test-data with statistics from train data
testX /= (generator.std + K.epsilon())

# Do your regular fitting
model.fit_generator(..., validation_data=(testX, testY), ...)

Note that this is only possible if you have a reasonable small dataset, like CIFAR-10. Otherwise the solution proposed by Marcin sounds good more reasonable.

answered Oct 12 '22 09:10

Alexander Pacha

I am using the datagen.fit function itself.

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True)
train_datagen.fit(train_data)

test_datagen = ImageDataGenerator(  
    featurewise_center=True, 
    featurewise_std_normalization=True)
test_datagen.fit(train_data)

Ideally with this, test_datagen fitted on training dataset will learn the training datasets statistics. Then it will use these statistics to normalize testing data.

answered Oct 12 '22 07:10

Hari

Related questions
                            
                                How to extract values from a Pandas DataFrame, rather than a Series (without referencing the index)?
                            
                                tensorflow code TypeError: unsupported operand type(s) for *: 'int' and 'Flag'
                            
                                How to override ModelChoiceField / ModelMultipleChoiceField default widget with a template for each choice
                            
                                Unable to read table from website using Beautifulsoup
                            
                                Django TypeError: __init__() takes 1 positional argument but 2 were given
                            
                                Subtract two dataframe with the same name different index
                            
                                How to efficiently parallelize time series forecasting using dask?
                            
                                How to get a list of TestReports at the end of a py.test run?
                            
                                Python script to use data from Azure Storage Blob by stream, and update blob by stream without local file reading and uploading
                            
                                Django email not working - smtplib.SMTPServerDisconnected: Connection unexpectedly closed
                            
                                How to implement FIPS_mode() and FIPS_mode_set() in Python 3.6's ssl module?
                            
                                How could I sort the coordinates according to the serpentine in the image?
                            
                                Boto3 not uploading zip file to S3 python
                            
                                Python - Google OAuth2 - Wrong number of segments in token
                            
                                How to detect rectangle in a rectangle?
                            
                                Working with binary PNG images in PIL/pillow
                            
                                Webhooks for slot filling
                            
                                How to remove duplicates only if consecutive in a string? [duplicate]
                            
                                How to pass variables in spark SQL, using python?
                            
                                Python - How do you run a .py file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does data normalization work in keras during prediction?

Tags:

python

machine-learning

neural-network

tensorflow

keras

Alex Taylor

People also ask

4 Answers

Marcin Możejko

Martin Thoma

Alexander Pacha

Hari

Recent Activity

Donate For Us