I have been doing deep learning with CNN for a while and I realize that the inputs for a model are always squared images.
I see that neither convolution operation or neural network architecture itself require such property.
So, what is the reason for that?
Because square images are pleasing to the eye. But there are applications on non-square images when domain requires it. For instance SVHN original dataset is an image of several digits, and hence rectangular images are used as input to convnet, as here
From Suhas Pillai:
The problem is not with convolutional layers, it's the fully connected layers of the network ,which require fix number of neurons.For example, take a small 3 layer network + softmax layer. If first 2 layers are convolutional + max pooling, assuming the dimensions are same before and after convolution, and pooling reduces dim/2 ,which is usually the case. For an image of 3*32*32(C,W,H)with 4 filters in the first layer and 6 filters in the second layer ,the output after convolutional + max pooling at the end of 2nd layer, will be 6*8*8 ,whereas for an image with 3*64*64, at the end of 2nd layer output will be 6*16*16. Before doing fully connected,we stretch this as a single vector( 6*8*8=384 neurons)and do a fully connected operation. So, you cannot have different dimension fully connected layers for different size images. One way to tackle this is using spatial pyramid pooling, where you force the output of last convolutional layer to pool it to a fixed number of bins(I.e neurons) such that fully connected layer has same number of neurons. You can also check fully convolutional networks, which can take non-square images.
It is not necessary to have squared images. I see two "reasons" for it:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With