I'm new to deep learning. I'm planning to use caffe and preparing a dataset for the training.
Do all the images have to have the same size? And does it have to be a square?
If so, what would be the ideal size or how to choose it?
Resizing images is a critical preprocessing step in computer vision. Principally, our machine learning models train faster on smaller images. An input image that is twice as large requires our network to learn from four times as many pixels — and that time adds up.
Convolutional neural networks require identical image sizes to work properly.
Resizing images is a critical pre-processing step in computer vision. Principally, deep learning models train faster on small images. A larger input image requires the neural network to learn from four times as many pixels, and this increase the training time for the architecture.
TL;DR: The best way to deal with different sized images is to downscale them to match dimensions from the smallest image available. If you read out last post, you know that CNNs are able to learn information from images even if its channels are flipped, over a cost in the model accuracy.
Normally for deep learning this does not have to be the case. Convolutional Neural Networks do not depend on the image size and filters can be applied on all image sizes.
Still many frameworks and literally all papers use the same image sizes for training. In https://arxiv.org/pdf/1409.1556/ they used different sizes for evaluating the network. To achieve this you can use either resizing or crops or a combination of the both. Keep in mind that changing the aspect ratio is almost always a bad idea.
To choose a good image size it is important to note that a bigger image sizes will give you better accuracy normally. However all the filter take longer and the memory requirements rise with the image size. Additionally larger sizes yield diminishing improvements. I normally use 224x224, because it is often divisible through 2 and imagenet uses it too.
Finally the image size does not have to be square, but it is most of the time a good idea, because CNNs often cut the image size in half and often end up at something like 4x4 or 6x6. Doing this with a non square starting size will give you an akward ending size like 4x2 or 6x3.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With