In deep neural network, we can implement the skip connections to help:
Solve problem of vanishing gradient, training faster
The network learns a combination of low level and high level features
Recover info loss during downsampling like max pooling.
https://medium.com/@mikeliao/deep-layer-aggregation-combining-layers-in-nn-architectures-2744d29cab8
However, i read some source code, some implemented skip connections as concatenation, some as summation. So my question is what are the benefits of each of these implementations?
Skip Connections (or Shortcut Connections) as the name suggests skips some of the layers in the neural network and feeds the output of one layer as the input to the next layers. Skip Connections were introduced to solve different problems in different architectures.
At present, skip connection is a standard module in many convolutional architectures. By using a skip connection, we provide an alternative path for the gradient (with backpropagation). It is experimentally validated that this additional paths are often beneficial for the model convergence.
A block with a skip connection as in the image above is called a residual block , and a Residual Neural Network (ResNet) is just a concatenation of such blocks.
Skip connections are used to concatenate the feature maps from encoder layers to corresponding decoder layers (as shown in Fig. 4). The skip-connection architecture provides spatial information to each decoder so that it can effectively recover fine-grained details when producing output masks.
Basically, the difference relies on the different way in which the final layer is influenced by middle features.
Standard architectures with skip-connection using element-wise summation (e.g. ResNet) can be viewed as an iterative estimation procedure to some extent (see for instance this work), where the features are refined through the various layers of the network. The main benefits of this choice are that it works and is a compact solution (it keeps the number of features fixed across a block).
Architectures with concatenated skip-connections (e.g. DenseNet), allow the subsequent layers to re-use middle representations, maintaining more information which can lead to better performances. Apart from the feature re-use, another consequence is the implicit deep supervision (as in this work) which allow better gradient propagation across the network, especially for deep ones (in fact it has been used for the Inception architecture).
Obviously, if not properly designed, concatenating features can lead to an exponential growth of the parameters (this explains, in part, the hierarchical aggregation used in the work you pointed out) and, depending on the problem, using a lot of information could lead to overfitting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With