Pooling vs Pooling-over-time

Tags:

I understand conceptually what is happening in a max/sum pool as a CNN layer operation, but I see this term "max pool over time", or "sum pool over time" thrown around (e.g., "Convolutional Neural Networks for Sentence Classification" paper by Yoon Kim). What is the difference?

224

asked Jan 31 '18 19:01

Matt

2 Answers

The max-over-time pooling is usually applied in NLP (unlike ordinary max-pool, which is common in CNNs for computer vision tasks), so the setup is a little bit different.

The input to the max-over-time pooling is a feature map c = [c(1), ..., c(n-h+1)], which is computed over a sentence of length n with a filter of size h. The convolution operation is very similar to one with images, but in this case it's applied to 1-dimensional vector of words. This is the formula (3) in the paper.

The max-over-time pooling operation is very simple: max_c = max(c), i.e., it's a single number that gets a max over the whole feature map. The reason to do this, instead of "down-sampling" the sentence like in a CNN, is that in NLP the sentences naturally have different length in a corpus. This makes the feature maps different for different sentences, but we'd like to reduce the tensor to a fixed size to apply softmax or regression head in the end. As stated in the paper, it allows to capture the most important feature, one with the highest value for each feature map.

Note that in computer vision, images are usually¹ of the same size, like 28x28 or 32x32, that's why it is unnecessary to downsample the feature maps to 1x1 immediately.

Sum-pooling-over-time is the same.

¹ Modern CNN can be trained with images of different size, but this requires the network to be all-convolutional, so it doesn't have any pooling layers. See this question for more details.

answered Oct 13 '22 04:10

Maxim

Max pooling typically applies to regions in a 2d feature plane, while max pooling over time happens along a 1d feature vector.

Here is a demonstration of max pooling from Stanford's CS231n:

max pooling

Max pooling over time takes a 1d feature vector and computes the max. The "over time" just means this is happening along the time dimension for some sequential input, like a sentence, or a concatenation of all phrases from a sentence as in the paper you linked.

For example:

Click to copy

[2, 7, 4, 1, 5] -> [7]

Source: CS224d Lecture 13 slides

answered Oct 13 '22 04:10

Imran

Related questions
                            
                                Multiple pipelines that merge within a sklearn Pipeline?
                            
                                Tied weights in Autoencoder
                            
                                Convert dataframe columns of object type to float
                            
                                How to Merge Numerical and Embedding Sequential Models to treat categories in RNN
                            
                                Improving k-means clustering
                            
                                What to do first: Feature Selection or Model Parameters Setting?
                            
                                How to prune a tree in R?
                            
                                What does sklearn "RidgeClassifier" do?
                            
                                Java Open Source Text Mining Frameworks [closed]
                            
                                Scalable or online out-of-core multi-label classifiers
                            
                                Put customized functions in Sklearn pipeline
                            
                                Tensorflow feature column for variable list of values
                            
                                Combining Rolling Origin Forecast Resampling and Group V-Fold Cross-Validation in rsample
                            
                                LSTM Followed by Mean Pooling
                            
                                EM score in SQuAD Challenge
                            
                                Pytorch ValueError: optimizer got an empty parameter list
                            
                                What algorithms are suitable for this simple machine learning problem?
                            
                                SVM in Matlab: Meaning of Parameter 'box constraint' in function fitcsvm
                            
                                Intuition for perceptron weight update rule
                            
                                Which feature scaling method to use before PCA?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pooling vs Pooling-over-time

Tags:

machine-learning

neural-network

nlp

convolution

max-pooling

Matt

People also ask

2 Answers

Maxim

Imran

Recent Activity

Donate For Us