In most of the architectures, conv layers are being followed by a pooling layer (max / avg etc.). As those pooling layers are just selecting the output of previous layer (i.e. conv), can we just use convolution with stride 2 and expect the similar accuracy results with reduced process need?
Max-pooling helps in extracting low-level features like edges, points, etc. While Avg-pooling goes for smooth features. If time constraint is not a problem, then one can skip the pooling layer and use a convolutional layer to do the same.
Average pooling method smooths out the image and hence the sharp features may not be identified when this pooling method is used. Max pooling selects the brighter pixels from the image. It is useful when the background of the image is dark and we are interested in only the lighter pixels of the image.
The result shows that the use of max-pooling can achieve a higher accuracy which is 84.6% compared to average pooling. Future studies are encouraged to collect more data to further prove the effectiveness of max-pooling layer. Key words: Scoliosis, Lenke, Convolutional Neural Network, Max-pooling, Average pooling.
In essence, max-pooling (or any kind of pooling) is a fixed operation and replacing it with a strided convolution can also be seen as learning the pooling operation, which increases the model's expressiveness ability.
As those pooling layers are just selecting the output of previous layer (i.e. conv), can we just use convolution with stride 2 and expect the similar accuracy results with reduced process need? Show activity on this post. Yes that can be done.
Also, pooling is faster to compute than convolutions. Still, you can always try replacing pooling by convolution with stride and see what works better. I asked one of my friends about this and he said the pooling layers are better because it introduces non-linearity. Do you agree? Hm not so sure I agree.
On the other hand, pooling is a cheaper operation than convolution, both in terms of the amount of computation that you need to do and number of parameters that you need to store (no parameters for pooling layer). There are examples when one of them is better choice than the other. The first layer in the ResNet uses convolution with strides.
It depends on what you replace with what. I was assuming a choice between A: conv (stride=1) + max pooling or B: conv (stride=1) + conv (stride=2). If you instead assume A: conv (stride=1) + max pooling replaced by B: conv (stride=2) things become different (B is then faster of course).
Yes that can be done. Its explained in the paper 'Striving for simplicity: The all convolutional net'
https://arxiv.org/pdf/1412.6806.pdf. Quote from the paper:
'We find that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With