Why do dilated convolutions preserve resolution?

Tags:

enter image description here

The animation is from here. I am wondering why the dilated convolution is claimed to preserve resolution. Apparently the input in blue is 7x7 and the output in green is 3x3.

EDIT:

One way to work around the resolution loss is to pad the input with roughly half the size of the current receptive field, but

this essentially undermines the statement that dilated convolutions do not lose resolution, because it is the padding that preserves the resolution. To get the same output size with the input, a conventional convolution needs even less padding.
since the padding grows exponentially, a relative not-that-small dilation factor will leads to a heavily padded input image. Imagine a 1024x1024 input with 10x dilation, it will become about 2048x2048 (please let me know if I am wrong here). This is 4x the original size, which means most of the convolutions are done on the padded area instead of the real input. Personally this seems quite counterintuitive to me.

604

asked Aug 22 '17 05:08

wlnirvana

1 Answers

This is indeed a dilated convolution with a 5x5 filter. If you imagine the blue part of the animation as a 3x3 image that's 0 padded, it preserves resolution.

With regard to your edit, the emphasis is really in this statement in the post you linked: dilated convolutions support exponential expansion of the receptive field without loss of resolution or coverage

Padding is done to preserve resolution. That is correct.

What we really want here is to expand the size of the receptive field. In the post you linked, with 3 3x3 dilated convolutions at an increasing dilation, we already achieve a receptive field of 15x15 in the feature maps.

To achieve the equivalent with 3x3 convolutions and no loss of coverage and no loss of resolution, we can do it with a stride of 3 (4 would result in loss of coverage) and extremely heavy padding (to the extent where it's like what you said, convolution with mostly padded zeroes). However, we will need 4 3x3 convolutions with stride 3 instead of 3 to achieve a receptive field of 15x15.

On top of that, the normal convolutions would have even more convolutions that don't make sense, compared to the dilated convolution case.

183

answered Sep 19 '22 18:09

jkschin

Related questions
                            
                                Tensorflow minimise with respect to only some elements of a variable
                            
                                Python - A way to learn and detect text patterns?
                            
                                How to discover new classes in a classification machine learning algorithm?
                            
                                Neural network bias for each neuron
                            
                                What does it mean when train and validation loss diverge from epoch 1?
                            
                                Should RNN attention weights over variable length sequences be re-normalized to "mask" the effects of zero-padding?
                            
                                Can a model have both high bias and high variance? Overfitting and Underfitting?
                            
                                How does sp_randint work?
                            
                                How to register a custom gradient for a operation composed of tf operations
                            
                                Keras - how to get unnormalized logits instead of probabilities
                            
                                Why is the Mean Average Percentage Error(mape) extremely high?
                            
                                Extract text information from PDF files with different layouts - machine learning
                            
                                Subtract mean from image
                            
                                SkLearn Multinomial NB: Most Informative Features
                            
                                Tensorflow Object Detection API
                            
                                How to achieve stratified K fold splitting for arbitrary number of categorical variables?
                            
                                What does "shuffle" do in fit_generator in keras?
                            
                                Are there programs that iteratively write new programs?
                            
                                Simple example using BernoulliNB (naive bayes classifier) scikit-learn in python - cannot explain classification
                            
                                How to apply Machine Learning algorithm in PHP? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do dilated convolutions preserve resolution?

Tags:

machine-learning

computer-vision

convolution

wlnirvana

People also ask

1 Answers

jkschin

Recent Activity

Donate For Us