Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use CNN to train input data of different size?

CNN seems to be implemented mostly for fixed size input. Now I want to use CNN to train some sentences of different size, what are some common methods?

like image 831
Hao Tan Avatar asked Mar 28 '16 12:03

Hao Tan


People also ask

How a CNN can take images of different sizes as inputs?

As mentioned by Media in the above answer, it is not possible to directly use images of different sizes. It is because when you define a CNN architecture, you plan as to how many layers you should have depending on the input size. Without having a fixed input shape, you cannot define architecture of your model.

Does input size matter for CNN?

It is a common misconception, that when using these pretrained CNN, images need to be resized to 224x224. On the contrary, popular CNN are fully convolutional nets that can accept any input size. You can input any image size and these CNN output feature maps that are 32x times smaller.

Does the input image size affect CNNs performance?

Downscaling: Bigger images will be down scaled, this makes it harder for CNN to learn the features required for classification or detection as the number of pixels where the vital feature will be present is significantly reduced.


2 Answers

The following suggestion is mostly related to CNNs for computer vision taks (in particular for recognition), but might also be helpful in your domain: I would have a look at "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition" by He et al. proposing a Spatial Pyramid Pooling layer.

The general idea: The convolutional layers of a CNN (and related layers such as pooling, local response normalization etc.) are able to process variable sized input. Therefore, the problem of variable sized input propagates down to the first fully connected/inner product layer which requires a vector of fixed size. He et al. propose to add the Spatial Pyramid Pooling Layer just before the first fully-connected layer (details in the paper). The layer itself works by hierarchically partitioning the feature maps of the last convolutional layer (or the subsequent pooling or response normalization layer) into a fixed number of bins. Within these bins, responses are pooled as usually, creating a fixed-sized output (where the size depends on the hierarchy and number of bins). please see the paper for illustration.

The layer has been implemented based on Caffe and is available on GitHub: ShaoqingRen/SPP_net.

like image 174
David Stutz Avatar answered Oct 03 '22 22:10

David Stutz


DynamicCNN - for Theano/Lasagne by Fréderic Godin is an approach which might work better for sentences modeling. It is based on a paper named "A Convolutional Neural Network for Modelling Sentences" by Nal Kalchbrenner, Edward Grefenstette, Phil Blunsom from 2014.

Quoting the abstract of the mentioned paper:

The network uses Dynamic k-Max Pooling, a global pooling operation over linear sequences. The network handles input sentences of varying length and induces a feature graph over the sentence that is capable of explicitly capturing short and long-range relations. The network does not rely on a parse tree and is easily applicable to any language. We test the DCNN in four experiments: small scale binary and multi-class sentiment prediction, six-way question classification and Twitter sentiment prediction by distant supervision. The network achieves excellent performance in the first three tasks and a greater than 25% error reduction in the last task with respect to the strongest baseline.

I hadn't used it myself, but it solved another similar sentences modeling problem on SO.

like image 32
dvb Avatar answered Oct 03 '22 20:10

dvb