I've seen division by 255 used many times as normalization in CNN tutorials online, and this is done across the entire dataset before train test split.
I was under the impression that the test set should be normalized according to the mean/std/maxmin etc. of the training set. By using /255 across the whole dataset, apparently we are giving the training set a feel for the test set. Is that true?
What's the right approach here?
This:
x_train = (x_train - x_train_mean)/x_train_std
x_test = (x_test - x_test_mean)/x_test_std
or this:
x_train = (x_train - x_train_mean)/x_train_std
x_test = (x_test - x_train_mean)/x_train_std
or this:
data/255
Thanks
I've been asked to provide background to what I've tried: This seems to be ungoogleable, I haven't found any discussion on it.
edit: Just another thought.
Because both train and test set are already on the same scale (ie. each pixel from 0-255) I assume that dividing by 255 doesn't make a difference, now they're on the same scale, but from 0-1.
To reduce this we can normalize the values to range from 0 to 1. In this way, the numbers will be small and the computation becomes easier and faster. As the pixel values range from 0 to 256, apart from 0 the range is 255. So dividing all the values by 255 will convert it to range from 0 to 1.
Image normalization is a typical process in image processing that changes the range of pixel intensity values. Its normal purpose is to convert an input image into a range of pixel values that are more familiar or normal to the senses, hence the term normalization.
Normalizing image inputs: Data normalization is an important step which ensures that each input parameter (pixel, in this case) has a similar data distribution. This makes convergence faster while training the network.
Your guess is correct, dividing an image by 255 simply rescales the image from 0-255 to 0-1. (Converting it to float from int makes computation convenient too) However, neither is required. When zero-centering the data,the mean, however, can't leak into the testing set: (http://cs231n.github.io/neural-networks-2/#datapre)
x_train = (x_train - x_train_mean)
x_test = (x_test - x_train_mean)
Moreover, you can use sklearn's Pipeline class(https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) and use fit() and/or fit_transform() methods to simplify the process.If you're using Keras, there's a wrapper for it
I will just speculate a bit.
The pixel values in a grayscale image are in [0, 255]. However, many images may be in a narrow range. For example, an image can be [100-150].
When you scale this image by 255.0
, then your range will be approx [0.4-0.6]. However, when you do (im - mean(im))/std(im)
, this range will be expanded nicely.
I tested something very simple on python.
def get_zero_mean_std(a):
a = (a - np.mean(a))/np.std(a)
print(a)
get_zero_mean_std(np.array([3,2,1, 6]))
[ 0. -0.535 -1.069 1.604]
get_zero_mean_std(np.array([3,2,1, 15]))
[-0.397 -0.573 -0.749 1.719]
get_zero_mean_std(np.array([3,2,1,3,1,2,1,1,2]))
[ 1.556 0.283 -0.99 1.556 -0.99 0.283 -0.99 -0.99 0.283]
As you can see, it is putting the values in a nice range.
If I normalized by 255.
or maximum value, the first 3 values of the second array would have been in a very narrow range whereas the last value would have been higher.
So, long story short, one reason might be that (im - mean(im))/std(im)
is a better normalizer than regular division.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With