I find quite a lot of code examples where people are preprocessing their image-data with either using rescale=1./255
or they are using they preprocessing_function
setting it to the preprocess_input
of the respective model they are using within the ImageDataGenerator. First I thought using rescale=1./255
only works when dealing with a pretrained vgg16 model, but I keep seeing examples where it is being used with pre-trained resetnet50, inception etc. as well.
While the keras-blog (https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html) uses this approach...
ImageDataGenerator(rescale=1./255, ...
... the Keras docs (https://keras.io/applications/) uses this approach:
from keras.applications.vgg19 import preprocess_input
ImageDataGenerator(preprocessing_function=preprocess_input, ...
I thought using the respective preprocess_input of the respective model I want to train is always superior to using the rescale=1./255 approach, since it will 100% reflect the preprocessing that has been used during training of the pretrained model.
I need some clarification on when to use rescale=1./255
vs keras build-in preprocess_input
of the respective model I want to train on when preprocessing images for transfer-learning. Does this only make a difference when using pretrained-models i.e. with loaded weights vs training from scratch?
I had similar questions, and after running the small experiments below, I think you need to always use preprocess_input
when using pre trained models, and use rescale when training from scratch.
Obviously when you directly used a pre trained model for inference, you have to use preprocess_input
: for example I tried to use resnet50
on the kaggle dogs-vs-cats dataset, with rescale=1./255
it returns the index 111 (nematode, nematode worm, roundworm)
as the most likely class for all images, whereas with preprocess_input
it mostly returns indices corresponding to dogs and cats as expected.
Then I tried to use resnet50
with include_top=False
, frozen weights from imagenet, one GlobalAveragePooling2D
layer and one final dense sigmoid layer. I trained it with Adam on 2000 images of the kaggle dogs-vs-cats dataset (and I used 1000 images as validation). Using rescale it does not manage to fit anything after 5 epochs, it always predict the first class (though strangely the training accuracy is 97%, but when I run evaluate_generator`` on the training images, the accuracy is **50%**). But with
preprocess_input, it achieves **98%** accuracy on the validation set. Also note that you do not really need the images to be of the same dimensions as the trained models, for example if I use 150 instead of 224, I still get a **97.5%** accuracy. Without any rescaling or
preprocess_input`, I got a 95% validation accuracy.
I tried the same thing with vgg16
, with rescaling it does manage to fit, but to 87% vs 97% using preprocess_input
, and 95% without anything.
Then I trained a small conv network from scratch with 10 epochs, without anything or using resnet50
preprocess_input
, it does not fit at all, but with rescaling I got a 70% validation accuracy.
First I thought using rescale=1./255 only works when dealing with a pretrained vgg16 model, but I keep seeing examples where it is being used with pre-trained resetnet50, inception etc. as well.
The reason that is done is because you need to NORMALIZE your input. Normally the formula for min-max normalization is
Which is the equivalent of doing
1./255
Since the pixel values of the image will be between 0 and 1
The reason for normalizing the input has to do with numerical stability and convergence (technically you do not need it, but with it, the neural network has a higher chance of converging and the gradient descent/adam algorithm is way more likely to be stable)
As per Does this only make a difference when using pretrained-models i.e. with loaded weights vs training from scratch?
No, it is not linked to pretrained models only, it is a common technique when using certain algorithms in machine learning (neural networks being one of them).
If you are interested on REALLY understanding what goes on behind all this and why it is so important to normalize, I strongly recommend you to take the Andrew Ng course on machine learning
If you are using Transfer Learning in a way where you only leveraging the network structure but retraining the entire network (may be starting with leveraged weights), you may choose to setup your own preprocessing strategy. Which means you can scale by diving by 255.0 or use preprocess_input or even a custom implementation of preprocess.
If you are using Transfer Learning where you are not retraining the entire network but replacing the last layer with a few fully connected dense layers, then it is strongly recommended to use the preprocess_input associated with the network you are training. This is because the weights associated with layers which you are not training are accustomed to a specific preprocessing step. For example, if you look at preprocess_input for InceptionResNetV2 and follow the codepath to _preprocess_numpy_input, it doesn't normalize the image in every case, only when mode is "tf" or "pytorch". So if you trained an InceptionResNetV2 and normalized the image by diving by 255, it might not train the classifier the way you intended to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With