I refer to Multi-Scale Context Aggregation by Dilated Convolutions. <ul> <li>A 2x2 kernel would have holes in it such that it becomes a 3x3 kernel.</li> <li>A 3x3 kernel would have holes in it such that it becomes a 5x5 kernel.</li> <li>Above assumes interval 1 of course.</li> </ul> I can clearly see that this allows you to effectively use 4 parameters but have a receptive field of 3x3 and 9 parameters but have a receptive field of 5x5. Is the case of dilated convolution simply to save on parameters while reaping the benefit of a larger receptive field and thus save memory and computations?

TLDR <ol> <li>Dilated convolutions have generally improved performance (see the better semantic segmentation results in Multi-Scale Context Aggregation by Dilated Convolutions)</li> <li>The more important point is that the architecture is based on the fact that dilated convolutions support exponential expansion of the receptive field without loss of resolution or coverage.</li> <li>Allows one to have larger receptive field with same computation and memory costs while also preserving resolution.</li> <li> Pooling and Strided Convolutions are similar concepts but both reduce the resolution. </li> </ol> @Rahul referenced WaveNet, which put it very succinctly in 2.1 Dilated Causal Convolutions. It is also worth looking at Multi-Scale Context Aggregation by Dilated Convolutions I break it down further here: <img src="https://i.stack.imgur.com/tOg0g.png" alt="Screenshot from https://arxiv.org/pdf/1511.07122.pdf"> <ul> <li>Figure (a) is a 1-dilated 3x3 convolution filter. In other words, it's a standard 3x3 convolution filter.</li> <li>Figure (b) is a 2-dilated 3x3 convolution filter. The red dots are where the weights are and everywhere else is 0. In other words, it's a 5x5 convolution filter with 9 non-zero weights and everywhere else 0, as mentioned in the question. The receptive field in this case is 7x7 because each unit in the previous output has a receptive field of 3x3. The highlighted portions in blue show the receptive field and NOT the convolution filter (you could see it as a convolution filter if you wanted to but it's not helpful).</li> <li>Figure (c) is a 4-dilated 3x3 convolution filter. It's a 9x9 convolution filter with 9 non-zeros weights and everywhere else 0. From (b), we have it that each unit now has a 7x7 receptive field, and hence you can see a 7x7 blue portion around each red dot.</li> </ul> To draw an explicit contrast, consider this: <ul> <li>If we use 3 successive layers of 3x3 convolution filters with stride of 1, the effective receptive field will only be 7x7 at the end of it. However, with the same computation and memory costs, we can achieve 15x15 with dilated convolutions. Both operations preserve resolution.</li> <li>If we use 3 successive layers of 3x3 convolution filters with increasing stride at an exponential rate at exactly the same rate as dilated convolutions in the paper, we will get a 15x15 receptive field at the end of it but with loss of coverage eventually as the stride gets larger. What this loss of coverage means is that the effective receptive field at some point will not be what we see above. Some parts will not be overlapping.</li> </ul>

In addition to the benefits you already mentioned such as larger receptive field, efficient computation and lesser memory consumption, the dilated causal convolutions also has the following benefits: <ul> <li>it preserves the resolution/dimensions of data at the output layer. This is because the layers are dilated instead of pooling, hence the name dilated causal convolutions.</li> <li>it maintains the ordering of data. For example, in 1D dilated causal convolutions when the prediction of output depends on previous inputs then the structure of convolution helps in maintaining the ordering of data. </li> </ul> I'd refer you to read this amazing paper WaveNet which applies dilated causal convolutions to raw audio waveform for generating speech, music and even recognize speech from raw audio waveform. I hope you find this answer helpful.

What's the use of dilated convolutions?

2 Answers

TLDR

Dilated convolutions have generally improved performance (see the better semantic segmentation results in Multi-Scale Context Aggregation by Dilated Convolutions)
The more important point is that the architecture is based on the fact that dilated convolutions support exponential expansion of the receptive field without loss of resolution or coverage.
Allows one to have larger receptive field with same computation and memory costs while also preserving resolution.
Pooling and Strided Convolutions are similar concepts but both reduce the resolution.

@Rahul referenced WaveNet, which put it very succinctly in 2.1 Dilated Causal Convolutions. It is also worth looking at Multi-Scale Context Aggregation by Dilated Convolutions I break it down further here:

Screenshot from https://arxiv.org/pdf/1511.07122.pdf

Figure (a) is a 1-dilated 3x3 convolution filter. In other words, it's a standard 3x3 convolution filter.
Figure (b) is a 2-dilated 3x3 convolution filter. The red dots are where the weights are and everywhere else is 0. In other words, it's a 5x5 convolution filter with 9 non-zero weights and everywhere else 0, as mentioned in the question. The receptive field in this case is 7x7 because each unit in the previous output has a receptive field of 3x3. The highlighted portions in blue show the receptive field and NOT the convolution filter (you could see it as a convolution filter if you wanted to but it's not helpful).
Figure (c) is a 4-dilated 3x3 convolution filter. It's a 9x9 convolution filter with 9 non-zeros weights and everywhere else 0. From (b), we have it that each unit now has a 7x7 receptive field, and hence you can see a 7x7 blue portion around each red dot.

To draw an explicit contrast, consider this:

If we use 3 successive layers of 3x3 convolution filters with stride of 1, the effective receptive field will only be 7x7 at the end of it. However, with the same computation and memory costs, we can achieve 15x15 with dilated convolutions. Both operations preserve resolution.
If we use 3 successive layers of 3x3 convolution filters with increasing stride at an exponential rate at exactly the same rate as dilated convolutions in the paper, we will get a 15x15 receptive field at the end of it but with loss of coverage eventually as the stride gets larger. What this loss of coverage means is that the effective receptive field at some point will not be what we see above. Some parts will not be overlapping.

175

answered Oct 08 '22 02:10

jkschin

In addition to the benefits you already mentioned such as larger receptive field, efficient computation and lesser memory consumption, the dilated causal convolutions also has the following benefits:

it preserves the resolution/dimensions of data at the output layer. This is because the layers are dilated instead of pooling, hence the name dilated causal convolutions.
it maintains the ordering of data. For example, in 1D dilated causal convolutions when the prediction of output depends on previous inputs then the structure of convolution helps in maintaining the ordering of data.

I'd refer you to read this amazing paper WaveNet which applies dilated causal convolutions to raw audio waveform for generating speech, music and even recognize speech from raw audio waveform.

I hope you find this answer helpful.

answered Oct 08 '22 00:10

Rahul

Related questions
                            
                                What does the copy_initial_weights documentation mean in the higher library for Pytorch?
                            
                                Tensorflow CNN training images are all different sizes
                            
                                Computational Complexity of Self-Attention in the Transformer Model
                            
                                How to implement multi-class semantic segmentation?
                            
                                loss, val_loss, acc and val_acc do not update at all over epochs
                            
                                Inconsistency between image resizing with Keras (PIL) and TensorFlow?
                            
                                How to implement dropout in Pytorch, and where to apply it
                            
                                CNN model conditional layer in Keras
                            
                                Caffe Multiple Input Images
                            
                                What is the default variable initializer in Tensorflow?
                            
                                Caffe: What can I do if only a small batch fits into memory?
                            
                                Is there anyway to use tensorflow-gpu with intel(r) hd graphics 520?
                            
                                Tensorflow weight initialization
                            
                                keras: Use one model output as another model input
                            
                                Memory error when using Keras ImageDataGenerator
                            
                                How to decide the size of layers in Keras' Dense method?
                            
                                How to normalize a 4D numpy array?
                            
                                How to use keras ReduceLROnPlateau
                            
                                Appropriate Deep Learning Structure for multi-class classification
                            
                                Using sample_weight in Keras for sequence labelling

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the use of dilated convolutions?

Tags:

deep-learning

jkschin

People also ask

2 Answers

jkschin

Rahul

Recent Activity

Donate For Us