Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the expected input range for working with Keras VGG models?

I'm trying to use a pretrained VGG 16 from keras. But I'm really unsure about what the input range should be.

Quick answer, which of these color orders?

  • RGB
  • BGR

And which range?

  • 0 to 255?
  • balanced from about -125 to about +130?
  • 0 to 1?
  • -1 to 1?

I notice the file where the model is defined imports an input preprocessor:

from .imagenet_utils import preprocess_input

But this preprocessor is never used in the rest of the file.

Also, when I check the code for this preprocessor, it has two modes: caffe and tf (tensorflow).

Each mode works differently.

Finally, I can't find consistent documentation on the internet.

So, what is the best range for working? To what range are the model weights trained?

like image 832
Daniel Möller Avatar asked Oct 07 '17 16:10

Daniel Möller


People also ask

What is the input size of VGG16?

The default input size for this model is 224x224. Note: each Keras Application expects a specific kind of input preprocessing. For VGG16, call tf. keras.

What is VGG16 in keras?

VGG16 is a convolution neural net (CNN ) architecture which was used to win ILSVR(Imagenet) competition in 2014. It is considered to be one of the excellent vision model architecture till date.

What is weight in VGG16?

The 16 in VGG16 refers to 16 layers that have weights. In VGG16 there are thirteen convolutional layers, five Max Pooling layers, and three Dense layers which sum up to 21 layers but it has only sixteen weight layers i.e., learnable parameters layer. VGG16 takes input tensor size as 224, 244 with 3 RGB channel.


1 Answers

The model weights were ported from caffe, so it's in BGR format.

Caffe uses a BGR color channel scheme for reading image files. This is due to the underlying OpenCV implementation of imread. The assumption of RGB is a common mistake.

You can find the original caffe model weight files on VGG website. This link can also be found on Keras documentation.

I think the second range would be the closest one. There's no scaling during training, but the authors have subtracted the mean value of the ILSVRC2014 training set. As stated in the original VGG paper, section 2.1:

The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.

This sentence is actually what imagenet_utils.preprocess_input(mode='caffe') does.

  1. Convert from RGB to BGR: because keras.preprocessing.image.load_img() loads images in RGB format, this conversion is required for VGG16 (and all models ported from caffe).
  2. Subtract the mean BGR values: (103.939, 116.779, 123.68) is subtracted from the image array.

The preprocessor is not used in vgg16.py. It's imported in the file so that users can use the preprocess function by calling keras.applications.vgg16.preprocess_input(rgb_img_array), without caring about where model weights come from. The argument for preprocess_input() is always an image array in RGB format. If the model was trained with caffe, preprocess_input() will convert the array into BGR format.

Note that the function preprocess_input() is not intended to be called from imagenet_utils module. If you are using VGG16, call keras.applications.vgg16.preprocess_input() and the images will be converted to a suitable format and range that VGG16 was trained on. Similarly, if you are using Inception V3, call keras.applications.inception_v3.preprocess_input() and the images will be converted to the range that Inception V3 was trained on.

like image 124
Yu-Yang Avatar answered Sep 28 '22 06:09

Yu-Yang