ImageNet images are all different sizes, but neural networks need a fixed size input.
One solution is to take a crop size that is as large as will fit in the image, centered around the center point of the image. This works but has some drawbacks. Often times important parts of the object of interest in the image are cut out, and there are even cases where the correct object is completely missing while another object that belongs to a different class is visible, meaning your model will be trained wrong for that image.
Another solution would be to use the entire image and zero pad it to where each image has the same dimensions. This seems like it would interfere with the training process though, and the model would learn to look for vertical/horizontal patches of black near the edge of images.
What is commonly done?
The Solution Proportional Scaling, much like it sounds, is when we scale an image to size while maintaining the correct proportions of the image. Cropping is when you remove unnecessary or unwanted bits from a photo.
You can crop images using plain HTML5 and CSS code, without using JavaScript or any other scripting language. We'll show several techniques for achieving this, most of which take advantage of CSS properties like width, height, overflow, object-fit, object-position, and padding-top.
There are several approaches:
You could take a look how the latest ImageNet networks are trained, like VGG and ResNet. They usually describe this step in detail.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With