Random cropping and flipping in convolutional neural networks

Tags:

In a lot of research papers I read about Convolutional Neural Networks (CNN), I see that people randomly crop a square region (e.g. 224x224) from the images and then randomly flip it horizontally. Why is this random cropping and flipping done? Also, why do people always crop a square region. Can CNNs not work on rectangular regions?

588

asked Sep 29 '15 11:09

chronosynclastic

2 Answers

This is referred to as data augmentation. By applying transformations to the training data, you're adding synthetic data points. This exposes the model to additional variations without the cost of collecting and annotating more data. This can have the effect of reducing overfitting and improving the model's ability to generalize.

The intuition behind flipping an image is that an object should be equally recognizable as its mirror image. Note that horizontal flipping is the type of flipping often used. Vertical flipping doesn't always make sense but this depends on the data.

The idea behind cropping is that to reduce the contribution of the background in the CNNs decision. That's useful if you have labels for locating where your object is. This lets you use surrounding regions as negative examples and building a better detector. Random cropping can also act as a regularizer and base your classification on the presence of parts of the object instead of focusing everything on a very distinct feature that may not always be present.

Why do people always crop a square region?

This is not a limitation of CNNs. It could be a limitation of a particular implementation. Or by design because assuming a square input can lead to optimizing the implementation for speed. I wouldn't read too much into this.

CNNs with variable sized input vs. fixed input:

This is not specific to cropping to a square but more generally why the input is sometimes resized/cropped/warped before inputting into a CNN:

Something to keep in mind is that designing a CNN involves deciding on whether to support variable-sized input or not. Convolution operations, pooling and non-linearities will work for any input dimensions. However, when use CNNs for solving image classification you usually end up with a fully-connected layer(s) such as logistic regression or MLP. The fully-connected layer is how the CNN produces a fixed-size output vector. The fixed-sized output can restrict the CNN to a fixed-sized input.

There are definitely workarounds to allow for variable-sized input and still produce a fixed sized output. The simplest is to use a convolution layer to perform classification over regular patches in an image. This idea has been around for a while. The intention behind it was to detect multiple occurrences of objects in the image and classify each occurrence. The earliest example I can think of is the work by Yann LeCun's group in the 1990s to simultaneously classify and localize digits in a string. This is referred to as turning a CNN with fully-connected layers into fully convolutional network. Most recent examples of fully-convolutional networks are applied to solve semantic segmentation and classify each pixel in an image. Here it is required to produce an output that matches the dimensions of the input. Another solution is to use global pooling at the end of a CNN to turn variable sized feature maps to fixed size output. The size of the pooling window is set equal to the feature map computed from the last conv. layer.

110

answered Oct 09 '22 02:10

ypx

@ypx is already giving a good answer on why data-augmentation is needed. I am going to share more information about why people use square images of fixed size as input.

Why fixed size input image?

If you have basic knowledge about convolutional neural networks, you will know that for convolutional, pooling layers and non-linearity layers, it is fine that the input images have variable size. But neural networks usually have fully-connected layers as classifiers, the weight between last conv layers and first fully-connected layer is fixed. If you give the network variable size input image, there will be a problem because the feature map size and weight do not match. That is one reason fixed size input image is used.

Another reason is that by fixing the image size, the training time of neural networks can be reduced. This is because most (if not all) deep learning packages are written to process a batch of images in tensor format (usually in shape (N, C, H, W), N is the batchsize, C is the channel number, H and W are width and height of the image). If your input images do not have fixed size, you can not pack them into a batch. Even if you network can process variable size input image, you still have to input 1 image at a time. This is slower compared to batch processing.

Can we use variable size input image?

Yes, as long as you can produce fixed size input for fully-connected layers, the input image size does not matter. A good choice is adaptive pooling, which will produce fixed output feature maps from variable size input feature maps. Right now, PyTorch provide two adaptive pooling layers for images, that is AdaptiveMaxPool2d and AdaptiveAvgPool2d. You can use layers to construct a neural network which can accept variable size input images.

answered Oct 09 '22 02:10

jdhao

Related questions
                            
                                OpenCV - find bounding box of largest blob in binary image
                            
                                Convert RGB to sRGB?
                            
                                Opencv error -Unsupported depth of input image:
                            
                                How to detect curves in a binary image?
                            
                                Numpy flatten RGB image array
                            
                                Alternatives to Matlab's Image Processing Toolkit
                            
                                Generating image thumbnails in ASP.NET?
                            
                                Matcher Assertions failed error opencv Android
                            
                                Pytorch: Image label
                            
                                Align images in opencv
                            
                                Get area within contours Opencv Python?
                            
                                How to apply Gabor wavelets to an image?
                            
                                Data augmentation techniques for small image datasets?
                            
                                How detect long edges of wall to prepare mask and recolor
                            
                                Explanation of rho and theta parameters in HoughLines
                            
                                jpeg image color gets drastically changed after just ImageIO.read() and ImageIO.write()
                            
                                Adaptive threshold of blurry image
                            
                                Formulas for Barrel/Pincushion distortion
                            
                                How to deal with RGB to YUV conversion
                            
                                Overlaying two images with automatic resize using ImageMagick

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Random cropping and flipping in convolutional neural networks

Tags:

image-processing

neural-network

conv-neural-network