In this page (https://pytorch.org/vision/stable/models.html), it says that "All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406]
and std = [0.229, 0.224, 0.225]
".
Shouldn't the usual mean
and std
of normalization be [0.5, 0.5, 0.5]
and [0.5, 0.5, 0.5]
? Why is it setting such strange values?
The mean and std of ImageNet are: mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. If the image is not similar to ImageNet, like medical images, then it is always advised to calculate the mean and std of the dataset and use them to normalize the images.
A tensor in PyTorch is like a NumPy array with the difference that the tensors can utilize the power of GPU whereas arrays can't. To normalize a tensor, we transform the tensor such that the mean and standard deviation become 0 and 1 respectively.
Using the mean and std of Imagenet is a common practice. They are calculated based on millions of images. If you want to train from scratch on your own dataset, you can calculate the new mean and std. Otherwise, using the Imagenet pretrianed model with its own mean and std is recommended.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With