Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pooling vs Pooling-over-time

I understand conceptually what is happening in a max/sum pool as a CNN layer operation, but I see this term "max pool over time", or "sum pool over time" thrown around (e.g., "Convolutional Neural Networks for Sentence Classification" paper by Yoon Kim). What is the difference?

like image 224
Matt Avatar asked Jan 31 '18 19:01

Matt


People also ask

What are the different types of pooling?

The three types of pooling operations are: Max pooling: The maximum pixel value of the batch is selected. Min pooling: The minimum pixel value of the batch is selected. Average pooling: The average value of all the pixels in the batch is selected.

What is mean pooling?

noun [ U or C ] /ˈpuːlɪŋ/ us. the act of sharing or combining two or more things: the pooling of resources.

What is the difference between average pooling and global average pooling?

This block performs exactly the same operation as the 2D Average pooling block except that the pool size (i.e., Horizontal pooling factor x Vertical pooling factor) is the size of the entire input of the block, i.e., it computes a single average value for each of the input channels.

What is the main purpose of pooling?

The main purpose of pooling is to reduce the size of feature maps, which in turn makes computation faster because the number of training parameters is reduced. The pooling operation summarizes the features present in a region, the size of which is determined by the pooling filter.


2 Answers

The max-over-time pooling is usually applied in NLP (unlike ordinary max-pool, which is common in CNNs for computer vision tasks), so the setup is a little bit different.

The input to the max-over-time pooling is a feature map c = [c(1), ..., c(n-h+1)], which is computed over a sentence of length n with a filter of size h. The convolution operation is very similar to one with images, but in this case it's applied to 1-dimensional vector of words. This is the formula (3) in the paper.

The max-over-time pooling operation is very simple: max_c = max(c), i.e., it's a single number that gets a max over the whole feature map. The reason to do this, instead of "down-sampling" the sentence like in a CNN, is that in NLP the sentences naturally have different length in a corpus. This makes the feature maps different for different sentences, but we'd like to reduce the tensor to a fixed size to apply softmax or regression head in the end. As stated in the paper, it allows to capture the most important feature, one with the highest value for each feature map.

Note that in computer vision, images are usually1 of the same size, like 28x28 or 32x32, that's why it is unnecessary to downsample the feature maps to 1x1 immediately.

Sum-pooling-over-time is the same.


1 Modern CNN can be trained with images of different size, but this requires the network to be all-convolutional, so it doesn't have any pooling layers. See this question for more details.

like image 87
Maxim Avatar answered Oct 13 '22 04:10

Maxim


Max pooling typically applies to regions in a 2d feature plane, while max pooling over time happens along a 1d feature vector.

Here is a demonstration of max pooling from Stanford's CS231n:

max pooling

Max pooling over time takes a 1d feature vector and computes the max. The "over time" just means this is happening along the time dimension for some sequential input, like a sentence, or a concatenation of all phrases from a sentence as in the paper you linked.

For example:

[2, 7, 4, 1, 5] -> [7]

Source: CS224d Lecture 13 slides

like image 28
Imran Avatar answered Oct 13 '22 04:10

Imran