Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are inputs for convolutional neural networks always squared images?

I have been doing deep learning with CNN for a while and I realize that the inputs for a model are always squared images.

I see that neither convolution operation or neural network architecture itself require such property.

So, what is the reason for that?

like image 732
T Nguyen Avatar asked Aug 16 '16 10:08

T Nguyen


3 Answers

Because square images are pleasing to the eye. But there are applications on non-square images when domain requires it. For instance SVHN original dataset is an image of several digits, and hence rectangular images are used as input to convnet, as here

like image 178
Yaroslav Bulatov Avatar answered Oct 05 '22 03:10

Yaroslav Bulatov


From Suhas Pillai:

The problem is not with convolutional layers, it's the fully connected layers of the network ,which require fix number of neurons.For example, take a small 3 layer network + softmax layer. If first 2 layers are convolutional + max pooling, assuming the dimensions are same before and after convolution, and pooling reduces dim/2 ,which is usually the case. For an image of 3*32*32(C,W,H)with 4 filters in the first layer and 6 filters in the second layer ,the output after convolutional + max pooling at the end of 2nd layer, will be 6*8*8 ,whereas for an image with 3*64*64, at the end of 2nd layer output will be 6*16*16. Before doing fully connected,we stretch this as a single vector( 6*8*8=384 neurons)and do a fully connected operation. So, you cannot have different dimension fully connected layers for different size images. One way to tackle this is using spatial pyramid pooling, where you force the output of last convolutional layer to pool it to a fixed number of bins(I.e neurons) such that fully connected layer has same number of neurons. You can also check fully convolutional networks, which can take non-square images.

like image 22
T Nguyen Avatar answered Oct 05 '22 04:10

T Nguyen


It is not necessary to have squared images. I see two "reasons" for it:

  • scaling: If images are scaled automatically from another aspect ratio (and landscape / portrait mode) this in average might introduce the least error
  • publications / visualizations: square images are easy to display together
like image 45
Martin Thoma Avatar answered Oct 05 '22 02:10

Martin Thoma