Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle variable sized input in CNN with Keras?

Tags:

I am trying to perform the usual classification on the MNIST database but with randomly cropped digits. Images are cropped the following way : removed randomly first/last and/or row/column.

I would like to use a Convolutional Neural Network using Keras (and Tensorflow backend) to perform convolution and then the usual classification.

Inputs are of variable size and i can't manage to get it to work.

Here is how I cropped digits

import numpy as np from keras.utils import to_categorical from sklearn.datasets import load_digits  digits = load_digits()  X = digits.images X = np.expand_dims(X, axis=3)  X_crop = list() for index in range(len(X)):     X_crop.append(X[index, np.random.randint(0,2):np.random.randint(7,9), np.random.randint(0,2):np.random.randint(7,9), :]) X_crop = np.array(X_crop)  y = to_categorical(digits.target)  from sklearn.model_selection import train_test_split  X_train, X_test, y_train, y_test = train_test_split(X_crop, y, train_size=0.8, test_size=0.2) 

And here is the architecture of the model I want to use

from keras.layers import Dense, Dropout from keras.layers.convolutional import Conv2D from keras.models import Sequential  model = Sequential()  model.add(Conv2D(filters=10,                   kernel_size=(3,3),                   input_shape=(None, None, 1),                   data_format='channels_last'))  model.add(Dense(128, activation='relu')) model.add(Dropout(0.2))  model.add(Dense(10, activation='softmax'))   model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])  model.summary()  model.fit(X_train, y_train, epochs=100, batch_size=16, validation_data=(X_test, y_test)) 
  1. Does someone have an idea on how to handle variable sized input in my neural network?

  2. And how to perform classification?

like image 531
Thomas Grsp Avatar asked Sep 13 '17 18:09

Thomas Grsp


People also ask

Which neural network layer helps in handling variable size inputs?

Fully convolutional neural network is able to do that. Parameters of conv layers are convolutional kernels. Convolutional kernel not so much care about input size(yes there are certain limitations related to stride, padding input and kernel size).

How can neural networks deal with varying input sizes?

In NLP you have an inherent ordering of the inputs so RNNs are a natural choice. For variable sized inputs where there is no particular ordering among the inputs, one can design networks which: use a repetition of the same subnetwork for each of the groups of inputs (i.e. with shared weights).

Does input size matter for CNN?

On the contrary, popular CNN are fully convolutional nets that can accept any input size. You can input any image size and these CNN output feature maps that are 32x times smaller.

How can you deal with variable length input sequences What about variable length output sequences?

The first and simplest way of handling variable length input is to set a special mask value in the dataset, and pad out the length of each input to the standard length with this mask value set for all additional entries created. Then, create a Masking layer in the model, placed ahead of all downstream layers.


2 Answers

TL/DR - go to point 4

So - before we get to the point - let's fix some problems with your network:

  1. Your network will not work because of activation: with categorical_crossentropy you need to have a softmax activation:

    model.add(Dense(10, activation='softmax')) 
  2. Vectorize spatial tensors: as Daniel mentioned - you need to, at some stage, switch your vectors from spatial (images) to vectorized (vectors). Currently - applying Dense to output from a Conv2D is equivalent to (1, 1) convolution. So basically - output from your network is spatial - not vectorized what causes dimensionality mismatch (you can check that by running your network or checking the model.summary(). In order to change that you need to use either GlobalMaxPooling2D or GlobalAveragePooling2D. E.g.:

    model.add(Conv2D(filters=10,               kernel_size=(3, 3),               input_shape=(None, None, 1),              padding="same",              data_format='channels_last')) model.add(GlobalMaxPooling2D()) model.add(Dense(128, activation='relu')) model.add(Dropout(0.2))  model.add(Dense(10, activation='softmax')) 
  3. Concatenated numpy arrays need to have the same shape: if you check the shape of X_crop you'll see that it's not a spatial matrix. It's because you concatenated matrices with different shapes. Sadly - it's impossible to overcome this issue as numpy.array need to have a fixed shape.

  4. How to make your network train on examples of different shape: The most important thing in doing this is to understand two things. First - is that in a single batch every image should have the same size. Second - is that calling fit multiple times is a bad idea - as you reset inner model states. So here is what needs to be done:

    a. Write a function which crops a single batch - e.g. a get_cropped_batches_generator which given a matrix cuts a batch out of it and crops it randomly.

    b. Use train_on_batch method. Here is an example code:

    from six import next  batches_generator = get_cropped_batches_generator(X, batch_size=16) losses = list() for epoch_nb in range(nb_of_epochs):     epoch_losses = list()     for batch_nb in range(nb_of_batches):         # cropped_x has a different shape for different batches (in general)         cropped_x, cropped_y = next(batches_generator)          current_loss = model.train_on_batch(cropped_x, cropped_y)         epoch_losses.append(current_loss)     losses.append(epoch_losses.sum() / (1.0 * len(epoch_losses)) final_loss = losses.sum() / (1.0 * len(losses)) 

So - a few comments to code above: First, train_on_batch doesn't use nice keras progress bar. It returns a single loss value (for a given batch) - that's why I added logic to compute loss. You could use Progbar callback for that also. Second - you need to implement get_cropped_batches_generator - I haven't written a code to keep my answer a little bit more clear. You could ask another question on how to implement it. Last thing - I use six to keep compatibility between Python 2 and Python 3.

like image 200
Marcin Możejko Avatar answered Oct 07 '22 07:10

Marcin Możejko


Usually, a model containing Dense layers cannot have variable size inputs, unless the outputs are also variable. But see the workaround and also the other answer using GlobalMaxPooling2D - The workaround is equivalent to GlobalAveragePooling2D. These are layers that can eliminiate the variable size before a Dense layer and suppress the spatial dimensions.

For an image classification case, you may want to resize the images outside the model.

When my images are in numpy format, I resize them like this:

from PIL import Image im = Image.fromarray(imgNumpy) im = im.resize(newSize,Image.LANCZOS) #you can use options other than LANCZOS as well imgNumpy = np.asarray(im) 

Why?

A convolutional layer has its weights as filters. There is a static filter size, and the same filter is applied to the image over and over.

But a dense layer has its weights based on the input. If there is 1 input, there is a set of weights. If there are 2 inputs, you've got twice as much weights. But weights must be trained, and changing the amount of weights will definitely change the result of the model.

As @Marcin commented, what I've said is true when your input shape for Dense layers has two dimensions: (batchSize,inputFeatures).

But actually keras dense layers can accept inputs with more dimensions. These additional dimensions (which come out of the convolutional layers) can vary in size. But this would make the output of these dense layers also variable in size.

Nonetheless, at the end you will need a fixed size for classification: 10 classes and that's it. For reducing the dimensions, people often use Flatten layers, and the error will appear here.


A possible fishy workaround (not tested):

At the end of the convolutional part of the model, use a lambda layer to condense all the values in a fixed size tensor, probably taking a mean of the side dimensions and keeping the channels (channels are not variable)

Suppose the last convolutional layer is:

model.add(Conv2D(filters,kernel_size,...)) #so its output shape is (None,None,None,filters) = (batchSize,side1,side2,filters) 

Let's add a lambda layer to condense the spatial dimensions and keep only the filters dimension:

import keras.backend as K  def collapseSides(x):      axis=1 #if you're using the channels_last format (default)        axis=-1 #if you're using the channels_first format      #x has shape (batchSize, side1, side2, filters)     step1 = K.mean(x,axis=axis) #mean of side1     return K.mean(step1,axis=axis) #mean of side2      #this will result in a tensor shape of (batchSize,filters) 

Since the amount of filters is fixed (you have kicked out the None dimensions), the dense layers should probably work:

model.add(Lambda(collapseSides,output_shape=(filters,))) model.add(Dense.......) ..... 

In order for this to possibly work, I suggest that the number of filters in the last convolutional layer be at least 10.

With this, you can make input_shape=(None,None,1)

If you're doing this, remember that you can only pass input data with a fixed size per batch. So you have to separate your entire data in smaller batches, each batch having images all of the same size. See here: Keras misinterprets training data shape

like image 22
Daniel Möller Avatar answered Oct 07 '22 05:10

Daniel Möller