Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CNN - Image Resizing VS Padding (keeping aspect ratio or not?)

While people usually tend to simply resize any image into a square while training a CNN (for example, resnet takes a 224x224 square image), that looks ugly to me, especially when the aspect ratio is not around 1.

(In fact, that might change ground truth, for example, the label that an expert might give the distorted image could be different than the original one).

So now I resize the image to, say, 224x160 , keeping the original ratio, and then I pad the image with 0s (by pasting it into a random location in a totally black 224x224 image).

My approach doesn't seem original to me, and yet I cannot find any information whatsoever about my approach versus the "usual" approach. Funky!

So, which approach is better? Why? (if the answer is data dependent, please share your thoughts regarding when one is preferable to the other.)

like image 741
Yoni Keren Avatar asked Dec 07 '17 14:12

Yoni Keren


People also ask

Why is it important to maintain aspect ratios when resizing an image?

When scaling your image, it's crucial to maintain the ratio of width to height, known as aspect ratio, so it doesn't end up stretched or warped. If you need a specific width and height, you may need a mixture of resizing and cropping to get the desired result.

How do you keep a ratio when resizing?

Press-and-hold the Shift key, grab a corner point, and drag inward to resize the selection area. Because you're holding the Shift key as you scale, the aspect ratio (the same ratio as your original photo) remains exactly the same.

How do you preserve aspect ratio when scaling images?

The Simple Solution Using CSSBy setting the width property to 100%, you are telling the image to take up all the horizontal space that is available. With the height property set to auto, your image's height changes proportionally with the width to ensure the aspect ratio is maintained.


2 Answers

According to Jeremy Howard, padding a big piece of the image (64x160 pixels) will have the following effect: The CNN will have to learn that the black part of the image is not relevant and does not help distinguishing between the classes (in a classification setting), as there is no correlation between the pixels in the black part and belonging to a given class. As you are not hard coding this, the CNN will have to learn it by gradient descent, and this might probably take some epochs. For this reason, you can do it if you have lots of images and computational power, but if you are on a budget on any of them, resizing should work better.

like image 132
David Masip Avatar answered Oct 17 '22 21:10

David Masip


Sorry, this is late but this answer is for anyone facing the same issue.

First, if scaling with changing the aspect ratio will affect some important features, then you have to use zero-padding.

Zero padding doesn't make it take longer for the network to learn because of the large black area itself but because of the different possible locations that the unpadded image could be inside the padded image since you can pad an image in many ways.

For areas with zero pixels, the output of the convolution operation is zero. The same with max or average pooling. Also, you can prove that the weight is not updated after backpropagation if the input associated with that weight is zero under some activation functions (e.g. relu, sigmoid). So the large area doesn't make any updates to the weights in this sense.

However, the relative position of the unpadded image inside the padded image does indeed affect training. This is not due to the convolution nor the pooling layers but the last fully connected layer(s). For example, if the unpadded image is on the left relative inside the padded image and the output of flattening the last convolution or pooling layer was [1, 0, 0] and the output for the same unpadded image but on the right relative inside the padded image was [0, 0, 1] then the fully connected layer(s) must learn that [1, 0, 0] and [0, 0, 1] are the same thing for a classification problem.

Therefore, learning the equivariance of different possible positions of the image is what makes training take more time. If you have 1,000,000 images then after resizing you will have the same number of images; on the other hand, if you pad and want to consider different possible locations (10 randomly for each image) then you will have 10,000,000 images. That is, training will take 10 times longer.

That said, it depends on your problem and what you want to achieve. Also, testing both methods will not hurt.

like image 22
Talal Alrawajfeh Avatar answered Oct 17 '22 20:10

Talal Alrawajfeh