To preface, I am new to the field of ML/CV, and am currently in the process of training a custom conv net using Caffe.
I am interested in mean image subtraction to achieve basic data normalization on my training images. However, I am confused as to how mean subtraction works and exactly what benefits it has.
I know that a "mean image" can be calculated from the training set, which is then subtracted from the training, validation, and testing sets to make the network less sensitive to differing background and lightening conditions.
Does this involve calculating the mean of all pixels in each image, and averaging these? Or, is the value from each pixel coordinate averaged across all images in the set (i.e. average values of pixels at location (1,1) for all images)? This may require that all images are the same size...
Also, for colored images (3-channels), is the value for each channel individually averaged?
Any clarity would be appreciated.
In math, when you subtract, you take one number away from another. If you subtract four from ten, you're left with six. You can also use subtract to mean "take away" in a more general sense, as in "If you subtract some of the salt from the recipe, the pasta will be healthier."
While image averaging is usually utilized for noise reduction, image subtraction can be employed to mitigate the effect of uneven illuminance. Moreover, we'll see that image subtraction allows us to compare images and detect changes.
Image subtraction is used for analysis of the results, i.e. the identification of areas of the sample where particle movement occurs, the evolution of the locations where particles are removed and their corresponding transportation paths and the evolution of particle motion over the height of the sample.
mean: simply divide the sum of pixel values by the total count - number of pixels in the dataset computed as len(df) * image_size * image_size.
In deep learning, there are in fact different practices as to how to subtract the mean image.
The first way is to subtract mean image as @lejlot described. But there is an issue if your dataset images are not the same size. You need to make sure all dataset images are in the same size before using this method (e.g., resize original image and crop patch of same size from original image). It is used in original ResNet paper, see reference here.
The second way is to subtract per-channel mean from the original image, which is more popular. In this way, you do not need to resize or crop the original image. You can just calculate the per-channel mean from the training set. This is used widely in deep learning, e.g, Caffe: here and here. Keras: here. PyTorch: here. (PyTorch also divide the per-channel value by standard deviation.)
Mean image is an image where i,j,c pixel is an average of i,j,c pixels from all images. So you take a mean separately for each position and each color channel. It requires all images to have the same size of course, otherwise it is not defined. Also, it is not really about being less sensitive to different conditions - it has nothing to do with it, it is literally just to keep initial activations in a reasonable range, nothing else.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With