Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does normalizing images by dividing by 255 leak information between train and test set?

I've seen division by 255 used many times as normalization in CNN tutorials online, and this is done across the entire dataset before train test split.

I was under the impression that the test set should be normalized according to the mean/std/maxmin etc. of the training set. By using /255 across the whole dataset, apparently we are giving the training set a feel for the test set. Is that true?

What's the right approach here?

This:

x_train = (x_train - x_train_mean)/x_train_std
x_test = (x_test - x_test_mean)/x_test_std

or this:

x_train = (x_train - x_train_mean)/x_train_std
x_test = (x_test - x_train_mean)/x_train_std

or this:

data/255

Thanks

I've been asked to provide background to what I've tried: This seems to be ungoogleable, I haven't found any discussion on it.

edit: Just another thought.

Because both train and test set are already on the same scale (ie. each pixel from 0-255) I assume that dividing by 255 doesn't make a difference, now they're on the same scale, but from 0-1.

like image 830
SCool Avatar asked Apr 26 '19 01:04

SCool


People also ask

Why should we normalize image pixel values or divide by 255?

To reduce this we can normalize the values to range from 0 to 1. In this way, the numbers will be small and the computation becomes easier and faster. As the pixel values range from 0 to 256, apart from 0 the range is 255. So dividing all the values by 255 will convert it to range from 0 to 1.

How does image normalization work?

Image normalization is a typical process in image processing that changes the range of pixel intensity values. Its normal purpose is to convert an input image into a range of pixel values that are more familiar or normal to the senses, hence the term normalization.

Why do we normalize images before training?

Normalizing image inputs: Data normalization is an important step which ensures that each input parameter (pixel, in this case) has a similar data distribution. This makes convergence faster while training the network.


2 Answers

Your guess is correct, dividing an image by 255 simply rescales the image from 0-255 to 0-1. (Converting it to float from int makes computation convenient too) However, neither is required. When zero-centering the data,the mean, however, can't leak into the testing set: (http://cs231n.github.io/neural-networks-2/#datapre)

x_train = (x_train - x_train_mean)

x_test = (x_test - x_train_mean)

Moreover, you can use sklearn's Pipeline class(https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) and use fit() and/or fit_transform() methods to simplify the process.If you're using Keras, there's a wrapper for it

like image 107
iva Avatar answered Oct 01 '22 06:10

iva


I will just speculate a bit.

The pixel values in a grayscale image are in [0, 255]. However, many images may be in a narrow range. For example, an image can be [100-150].

When you scale this image by 255.0, then your range will be approx [0.4-0.6]. However, when you do (im - mean(im))/std(im), this range will be expanded nicely.

I tested something very simple on python.

def get_zero_mean_std(a):
    a = (a - np.mean(a))/np.std(a)
    print(a)

get_zero_mean_std(np.array([3,2,1, 6]))

[ 0. -0.535 -1.069 1.604]

get_zero_mean_std(np.array([3,2,1, 15]))

[-0.397 -0.573 -0.749 1.719]

get_zero_mean_std(np.array([3,2,1,3,1,2,1,1,2]))

[ 1.556 0.283 -0.99 1.556 -0.99 0.283 -0.99 -0.99 0.283]

As you can see, it is putting the values in a nice range.

If I normalized by 255. or maximum value, the first 3 values of the second array would have been in a very narrow range whereas the last value would have been higher.

So, long story short, one reason might be that (im - mean(im))/std(im) is a better normalizer than regular division.

like image 26
smttsp Avatar answered Oct 01 '22 06:10

smttsp