CNN - Image Resizing VS Padding (keeping aspect ratio or not?)

Tags:

While people usually tend to simply resize any image into a square while training a CNN (for example, resnet takes a 224x224 square image), that looks ugly to me, especially when the aspect ratio is not around 1.

(In fact, that might change ground truth, for example, the label that an expert might give the distorted image could be different than the original one).

So now I resize the image to, say, 224x160 , keeping the original ratio, and then I pad the image with 0s (by pasting it into a random location in a totally black 224x224 image).

My approach doesn't seem original to me, and yet I cannot find any information whatsoever about my approach versus the "usual" approach. Funky!

So, which approach is better? Why? (if the answer is data dependent, please share your thoughts regarding when one is preferable to the other.)

741

asked Dec 07 '17 14:12

Yoni Keren

2 Answers

According to Jeremy Howard, padding a big piece of the image (64x160 pixels) will have the following effect: The CNN will have to learn that the black part of the image is not relevant and does not help distinguishing between the classes (in a classification setting), as there is no correlation between the pixels in the black part and belonging to a given class. As you are not hard coding this, the CNN will have to learn it by gradient descent, and this might probably take some epochs. For this reason, you can do it if you have lots of images and computational power, but if you are on a budget on any of them, resizing should work better.

132

answered Oct 17 '22 21:10

David Masip

Sorry, this is late but this answer is for anyone facing the same issue.

First, if scaling with changing the aspect ratio will affect some important features, then you have to use zero-padding.

Zero padding doesn't make it take longer for the network to learn because of the large black area itself but because of the different possible locations that the unpadded image could be inside the padded image since you can pad an image in many ways.

For areas with zero pixels, the output of the convolution operation is zero. The same with max or average pooling. Also, you can prove that the weight is not updated after backpropagation if the input associated with that weight is zero under some activation functions (e.g. relu, sigmoid). So the large area doesn't make any updates to the weights in this sense.

However, the relative position of the unpadded image inside the padded image does indeed affect training. This is not due to the convolution nor the pooling layers but the last fully connected layer(s). For example, if the unpadded image is on the left relative inside the padded image and the output of flattening the last convolution or pooling layer was [1, 0, 0] and the output for the same unpadded image but on the right relative inside the padded image was [0, 0, 1] then the fully connected layer(s) must learn that [1, 0, 0] and [0, 0, 1] are the same thing for a classification problem.

Therefore, learning the equivariance of different possible positions of the image is what makes training take more time. If you have 1,000,000 images then after resizing you will have the same number of images; on the other hand, if you pad and want to consider different possible locations (10 randomly for each image) then you will have 10,000,000 images. That is, training will take 10 times longer.

That said, it depends on your problem and what you want to achieve. Also, testing both methods will not hurt.

answered Oct 17 '22 20:10

Talal Alrawajfeh

Related questions
                            
                                How to programmatically set the Image source
                            
                                React-Native: Convert image url to base64 string
                            
                                Imagemagick can not find delegates library for .tiff format on mac os x mountain lion
                            
                                can i get image file width and height from uri in android?
                            
                                Why use Android Picasso library to download images?
                            
                                Android: how to convert whole ImageView to Bitmap?
                            
                                reading barcode from an image using javascript
                            
                                How to detect when an image has finished rendering in the browser (i.e. painted)?
                            
                                Convert pdf to jpeg using a free c# solution [closed]
                            
                                PIL: Convert Bytearray to Image
                            
                                Javascript: Cancel/Stop Image Requests
                            
                                Detection of coins (and fit ellipses) on an image
                            
                                Edit only alpha layer in GIMP
                            
                                Imagemagick - Resize images to 25px height and aspect ratio
                            
                                Android save view to jpg or png
                            
                                Javascript : get <img> src and set as variable?
                            
                                Android activity image background size
                            
                                How to embed images in html email
                            
                                get all the images from a folder in php
                            
                                Create a button with an image in Flutter?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CNN - Image Resizing VS Padding (keeping aspect ratio or not?)

Tags:

image

machine-learning

neural-network

computer-vision

conv-neural-network

Yoni Keren

People also ask

2 Answers

David Masip

Talal Alrawajfeh

Recent Activity

Donate For Us