I am trying to model a fully convolutional neural network using the Keras library, Tensorflow backend.
The issue I face is that of feeding ]differently sized images in batches to model.fit()
function. The training set consists of images of different sizes varying from 768x501 to 1024x760.
Not more than 5 images have the same dimensions, so grouping them into batches seems to be of no help.
Numpy allows storing the data in a single variable in list form. But the keras model.fit()
function throws an error on receiving a list type training array.
I do not wish to resize and lose the data as I already have a very small dataset.
How do I go about training this network?
Conventionally, when dealing with images of different sizes in CNN(which happens very often in real world problems), we resize the images to the size of the smallest images with the help of any image manipulation library (OpenCV, PIL etc) or some times, pad the images of unequal size to desired size.
Usually around 100 images are sufficient to train a class. If the images in a class are very similar, fewer images might be sufficient. the training images are representative of the variation typically found within the class.
If we pad the image by (F — 1)/2 pixels on all sides, the size of N x N will be preserved. Thus we have two types of convolutions, Valid Convolution and Same Convolution. Valid essentially means no padding. So each Convolution results in reduction in the size.
Since neural networks receive inputs of the same size, all images need to be resized to a fixed size before inputting them to the CNN [14]. The larger the fixed size, the less shrinking required. Less shrinking means less deformation of features and patterns inside the image.
I think Spatial Pyramid Pooling (SPP) might be helpful. Checkout this paper.
We note that SPP has several remarkable properties for deep CNNs:
1) SPP is able to generate a fixed-length output regardless of the input size, while the sliding window pooling used in the previous deep networks cannot;
2) SPP uses multi-level spatial bins, while the sliding window pooling uses only a single window size. Multi-level pooling has been shown to be robust to object deformations;
3) SPP can pool features extracted at variable scales thanks to the flexibility of input scales. Through experiments we show that all these factors elevate the recognition accuracy of deep networks.
yhenon
has implemented SPP for Keras on Github.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With