Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras: rescale=1./255 vs preprocessing_function=preprocess_input - which one to use?

Background

I find quite a lot of code examples where people are preprocessing their image-data with either using rescale=1./255 or they are using they preprocessing_function setting it to the preprocess_input of the respective model they are using within the ImageDataGenerator. First I thought using rescale=1./255 only works when dealing with a pretrained vgg16 model, but I keep seeing examples where it is being used with pre-trained resetnet50, inception etc. as well.

While the keras-blog (https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html) uses this approach...

ImageDataGenerator(rescale=1./255, ...

... the Keras docs (https://keras.io/applications/) uses this approach:

from keras.applications.vgg19 import preprocess_input
ImageDataGenerator(preprocessing_function=preprocess_input, ...

I thought using the respective preprocess_input of the respective model I want to train is always superior to using the rescale=1./255 approach, since it will 100% reflect the preprocessing that has been used during training of the pretrained model.

Question

I need some clarification on when to use rescale=1./255 vs keras build-in preprocess_input of the respective model I want to train on when preprocessing images for transfer-learning. Does this only make a difference when using pretrained-models i.e. with loaded weights vs training from scratch?

like image 583
AaronDT Avatar asked Feb 15 '19 03:02

AaronDT


3 Answers

I had similar questions, and after running the small experiments below, I think you need to always use preprocess_input when using pre trained models, and use rescale when training from scratch.

Obviously when you directly used a pre trained model for inference, you have to use preprocess_input: for example I tried to use resnet50 on the kaggle dogs-vs-cats dataset, with rescale=1./255 it returns the index 111 (nematode, nematode worm, roundworm) as the most likely class for all images, whereas with preprocess_input it mostly returns indices corresponding to dogs and cats as expected.

Then I tried to use resnet50 with include_top=False, frozen weights from imagenet, one GlobalAveragePooling2D layer and one final dense sigmoid layer. I trained it with Adam on 2000 images of the kaggle dogs-vs-cats dataset (and I used 1000 images as validation). Using rescale it does not manage to fit anything after 5 epochs, it always predict the first class (though strangely the training accuracy is 97%, but when I run evaluate_generator`` on the training images, the accuracy is **50%**). But withpreprocess_input, it achieves **98%** accuracy on the validation set. Also note that you do not really need the images to be of the same dimensions as the trained models, for example if I use 150 instead of 224, I still get a **97.5%** accuracy. Without any rescaling orpreprocess_input`, I got a 95% validation accuracy.

I tried the same thing with vgg16, with rescaling it does manage to fit, but to 87% vs 97% using preprocess_input, and 95% without anything.

Then I trained a small conv network from scratch with 10 epochs, without anything or using resnet50 preprocess_input, it does not fit at all, but with rescaling I got a 70% validation accuracy.

like image 78
Eric Avatar answered Oct 16 '22 14:10

Eric


First I thought using rescale=1./255 only works when dealing with a pretrained vgg16 model, but I keep seeing examples where it is being used with pre-trained resetnet50, inception etc. as well.

The reason that is done is because you need to NORMALIZE your input. Normally the formula for min-max normalization is

enter image description here

Which is the equivalent of doing

1./255

Since the pixel values of the image will be between 0 and 1

The reason for normalizing the input has to do with numerical stability and convergence (technically you do not need it, but with it, the neural network has a higher chance of converging and the gradient descent/adam algorithm is way more likely to be stable)

As per Does this only make a difference when using pretrained-models i.e. with loaded weights vs training from scratch? No, it is not linked to pretrained models only, it is a common technique when using certain algorithms in machine learning (neural networks being one of them).

If you are interested on REALLY understanding what goes on behind all this and why it is so important to normalize, I strongly recommend you to take the Andrew Ng course on machine learning

like image 37
Juan Antonio Gomez Moriano Avatar answered Oct 16 '22 15:10

Juan Antonio Gomez Moriano


If you are using Transfer Learning in a way where you only leveraging the network structure but retraining the entire network (may be starting with leveraged weights), you may choose to setup your own preprocessing strategy. Which means you can scale by diving by 255.0 or use preprocess_input or even a custom implementation of preprocess.

If you are using Transfer Learning where you are not retraining the entire network but replacing the last layer with a few fully connected dense layers, then it is strongly recommended to use the preprocess_input associated with the network you are training. This is because the weights associated with layers which you are not training are accustomed to a specific preprocessing step. For example, if you look at preprocess_input for InceptionResNetV2 and follow the codepath to _preprocess_numpy_input, it doesn't normalize the image in every case, only when mode is "tf" or "pytorch". So if you trained an InceptionResNetV2 and normalized the image by diving by 255, it might not train the classifier the way you intended to.

like image 36
Viman Deb Avatar answered Oct 16 '22 14:10

Viman Deb