Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the role of preprocess_input() function in Keras VGG model?

This question is kind of a follow up for the discussion in comments of this answer.

From what I understand, the preprocess_input() function does mean subtraction and std-dev dvision for the input images. The mean are those that are computed on ImageNet-1K database when training VGG.

But this answer says that when using VGG features as a loss function, preprocess_input() is not required and we just need to normalize the image to [0,1] range before passing to VGG. This confuses me...

  1. If we don't preprocess, then the input will be in different range compared to those images used to train VGG. How are the VGG features still valid?
  2. From what I understand from this answer, we should have images in [0,255] range and preprocess_input() function takes care of the normalization and all. From the source code, I understand that for caffe models, normalization to [0,1] range is not done. Instead mean is subtracted and std-dev is divided. How would just normalizing network output to [0,1] range as suggested in the comments of this answer achieve the same?

Edit 1:
I'm considering the models which output images. It is not specific to a single model. One example is image denoising network. The input to my network is a noisy image and its output is a denoised image. I want to minimize MSE between denoised image and ground truth image in VGG feature space. Whatever be the range of my network's output, I can easily change it to [0,255] by multiplying by appropriate factors. Similarly I can do any preprocessing required on my network's output (subtract mean, divide by std-dev).

Empirically I found that the output of preprocess function is in approx range [-128,151]. So VGG network is trained on images in this range. Now, if I feed it with images (or tensors from my network output) in the range [0,1], convolution would be fine but biases will cause problem right? To elaborate, for images in range [-128,151], a layer of VGG network may have learnt a bias of 5. When I feed an image in the range [-1,1] to the VGG network, the bias disrupts everything, right?

I'm not training VGG model. I'm using the weights from the model trained on ImageNet-1k database.

like image 660
Nagabhushan S N Avatar asked Oct 16 '22 03:10

Nagabhushan S N


1 Answers

In general you should not ignore or change the normalization of the data in which a model was trained. It could break the model in unexpected ways and since you are using the features in another learning model, it appears to work, but you have now hidden any changes in performance.

This is true specially for models that use saturating activations, for example with a ReLU you might get more zeros than with using normalized data.

Answer to your specific questions:

  1. Yes features would be in a different range for VGG and other networks, whether they are valid is another issue, there is a performance loss since normalization was not used.

  2. Changing the normalization scheme does not produce the same kind of normalization as the original, so it is not achieving the same. The code in the answer works but conceptually it is not doing the right thing.

like image 73
Dr. Snoopy Avatar answered Nov 15 '22 06:11

Dr. Snoopy