Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cropping/Scaling ImageNet Images

ImageNet images are all different sizes, but neural networks need a fixed size input.

One solution is to take a crop size that is as large as will fit in the image, centered around the center point of the image. This works but has some drawbacks. Often times important parts of the object of interest in the image are cut out, and there are even cases where the correct object is completely missing while another object that belongs to a different class is visible, meaning your model will be trained wrong for that image.

Another solution would be to use the entire image and zero pad it to where each image has the same dimensions. This seems like it would interfere with the training process though, and the model would learn to look for vertical/horizontal patches of black near the edge of images.

What is commonly done?

like image 408
Frobot Avatar asked May 03 '16 23:05

Frobot


People also ask

What is the difference between cropping and scaling?

The Solution Proportional Scaling, much like it sounds, is when we scale an image to size while maintaining the correct proportions of the image. Cropping is when you remove unnecessary or unwanted bits from a photo.

Can you crop images in CSS?

You can crop images using plain HTML5 and CSS code, without using JavaScript or any other scripting language. We'll show several techniques for achieving this, most of which take advantage of CSS properties like width, height, overflow, object-fit, object-position, and padding-top.


1 Answers

There are several approaches:

  • Multiple crops. For example AlexNet was originally trained on 5 different crops: center, top-left, top-right, bottom-left, bottom-right.
  • Random crops. Just take a number of random crops from the image and hope that the Neural Network will not be biased.
  • Resize and deform. Resize the image to a fixed size without considering the aspect ratio. This witll deform the image contents but preserves but now you are sure that no content is cut.
  • Variable-sized Inputs. Do not crop and train the network on variable sized images, using something like Spatial Pyramid Pooling to extract a fixed size feature vector that can be used with fully connected layers.

You could take a look how the latest ImageNet networks are trained, like VGG and ResNet. They usually describe this step in detail.

like image 158
Dr. Snoopy Avatar answered Oct 15 '22 21:10

Dr. Snoopy