Pooling can be considered as convolution whether it's max/average, right?
The difference is that conv has parameters for optimization, but pooling doesn't, right? - e.g. the weights that filter in pooling has are not changed during learning.
I'd also like to know what's the difference between the aims of conv and pooling.
Why do we use each layers? What'll happen, if we don't use each layers?
A conv-layer has parameters to learn (that is your weights which you update each step), whereas the pooling layer does not - it is just applying some given function e.g max-function. Save this answer.
Convolution: Combine filter values and input values (multiply and add). Pooling: Only use input values. Output Perform input-derived operation in window (e.g. max, mean, median, etc) to "collapse" over values. Max is most common.
The pooling layer summarises the features present in a region of the feature map generated by a convolution layer. So, further operations are performed on summarised features instead of precisely positioned features generated by the convolution layer.
A convolutional layer is the main building block of a CNN. It contains a set of filters (or kernels), parameters of which are to be learned throughout the training. The size of the filters is usually smaller than the actual image. Each filter convolves with the image and creates an activation map.
The difference can be summarized in (1) how do you compute them and (2) what is used for.
Take for example an input data that is a matrix (5x5) -think about an image of 5 by 5 pixels-. The pooling layer and the convolution layer are operations that are applied to each of the input "pixels". Let's take a pixel in the center of the image (to avoid to discuss what happens with the corners, will elaborate later) and define a "kernel" for both the pooling layer and the convolution layer of (3x3).
Pooling layer: you super-impose the pooling kernel on the input pixel (in the figure you put the center of the blue matrix on top of the black X_00, and take the maximum.
Convolutional layer: you super-impose the convolutional kernel on the input pixel (in the figure you put the center of the orange matrix on top of the black X_00) and then perform the element wise multiplication and then summation as indicated in the figure.
The convolution coefficients, F_.., where are they taken from ? they are learnt when training the network. For the maxpooling, you do not have to learn nothing, you take the maximum. You can consider the maxpooling is like a convolution but with fixed coefficients, and instead of summing, taking the maximum.
You perform this for each input element. What happens an the input image corners, depens on what your choice: discard the input elements at the sides/corners, pad, etc.. Also you can not move continuously, pixel by pixel, by jumping, etc...

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With