Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deep neural network skip connection implemented as summation vs concatenation? [closed]

In deep neural network, we can implement the skip connections to help:

  • Solve problem of vanishing gradient, training faster

  • The network learns a combination of low level and high level features

  • Recover info loss during downsampling like max pooling.

https://medium.com/@mikeliao/deep-layer-aggregation-combining-layers-in-nn-architectures-2744d29cab8

However, i read some source code, some implemented skip connections as concatenation, some as summation. So my question is what are the benefits of each of these implementations?

like image 766
Earthgod Avatar asked Mar 08 '18 02:03

Earthgod


People also ask

What are Skip connections in Neural Networks & Types of Skip connections?

Skip Connections (or Shortcut Connections) as the name suggests skips some of the layers in the neural network and feeds the output of one layer as the input to the next layers. Skip Connections were introduced to solve different problems in different architectures.

Why skip connections might improve the performance of our CNN models?

At present, skip connection is a standard module in many convolutional architectures. By using a skip connection, we provide an alternative path for the gradient (with backpropagation). It is experimentally validated that this additional paths are often beneficial for the model convergence.

Does ResNet add or concatenate?

A block with a skip connection as in the image above is called a residual block , and a Residual Neural Network (ResNet) is just a concatenation of such blocks.

Why are there skip connections in a unet?

Skip connections are used to concatenate the feature maps from encoder layers to corresponding decoder layers (as shown in Fig. 4). The skip-connection architecture provides spatial information to each decoder so that it can effectively recover fine-grained details when producing output masks.


1 Answers

Basically, the difference relies on the different way in which the final layer is influenced by middle features.

Standard architectures with skip-connection using element-wise summation (e.g. ResNet) can be viewed as an iterative estimation procedure to some extent (see for instance this work), where the features are refined through the various layers of the network. The main benefits of this choice are that it works and is a compact solution (it keeps the number of features fixed across a block).

Architectures with concatenated skip-connections (e.g. DenseNet), allow the subsequent layers to re-use middle representations, maintaining more information which can lead to better performances. Apart from the feature re-use, another consequence is the implicit deep supervision (as in this work) which allow better gradient propagation across the network, especially for deep ones (in fact it has been used for the Inception architecture).

Obviously, if not properly designed, concatenating features can lead to an exponential growth of the parameters (this explains, in part, the hierarchical aggregation used in the work you pointed out) and, depending on the problem, using a lot of information could lead to overfitting.

like image 76
Lemm Ras Avatar answered Sep 19 '22 15:09

Lemm Ras