Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between Conv layer and Pooling layer in CNN?

Pooling can be considered as convolution whether it's max/average, right?

The difference is that conv has parameters for optimization, but pooling doesn't, right? - e.g. the weights that filter in pooling has are not changed during learning.

I'd also like to know what's the difference between the aims of conv and pooling.

Why do we use each layers? What'll happen, if we don't use each layers?

like image 698
hrsma2i Avatar asked Apr 19 '17 02:04

hrsma2i


People also ask

What is the difference between convolution layer and pooling layer?

A conv-layer has parameters to learn (that is your weights which you update each step), whereas the pooling layer does not - it is just applying some given function e.g max-function. Save this answer.

Is pooling the same as convolution?

Convolution: Combine filter values and input values (multiply and add). Pooling: Only use input values. Output Perform input-derived operation in window (e.g. max, mean, median, etc) to "collapse" over values. Max is most common.

Is pooling a convolutional layer?

The pooling layer summarises the features present in a region of the feature map generated by a convolution layer. So, further operations are performed on summarised features instead of precisely positioned features generated by the convolution layer.

What is conv layer in CNN?

A convolutional layer is the main building block of a CNN. It contains a set of filters (or kernels), parameters of which are to be learned throughout the training. The size of the filters is usually smaller than the actual image. Each filter convolves with the image and creates an activation map.


1 Answers

The difference can be summarized in (1) how do you compute them and (2) what is used for.

  1. How do you compute them:

Take for example an input data that is a matrix (5x5) -think about an image of 5 by 5 pixels-. The pooling layer and the convolution layer are operations that are applied to each of the input "pixels". Let's take a pixel in the center of the image (to avoid to discuss what happens with the corners, will elaborate later) and define a "kernel" for both the pooling layer and the convolution layer of (3x3).

Pooling layer: you super-impose the pooling kernel on the input pixel (in the figure you put the center of the blue matrix on top of the black X_00, and take the maximum.

Convolutional layer: you super-impose the convolutional kernel on the input pixel (in the figure you put the center of the orange matrix on top of the black X_00) and then perform the element wise multiplication and then summation as indicated in the figure.

The convolution coefficients, F_.., where are they taken from ? they are learnt when training the network. For the maxpooling, you do not have to learn nothing, you take the maximum. You can consider the maxpooling is like a convolution but with fixed coefficients, and instead of summing, taking the maximum.

You perform this for each input element. What happens an the input image corners, depens on what your choice: discard the input elements at the sides/corners, pad, etc.. Also you can not move continuously, pixel by pixel, by jumping, etc...

  1. what is used for: max_pooling reduces the size of the input, and performs kind of summarization of the data, and at the same time provides some invariance to translational transformations (e.g. if the object moves left-right, up-down). convultion, depending on the conditions on the filter coefficients (e.g. a column must be negative, while other positive) can be regarded as filters allowing to extract some patterns, like vertical lines, horizontal lines, etc...

input image, max_pool_kernel, conv_kernel

like image 106
Antoni Avatar answered Sep 18 '22 09:09

Antoni