Residual networks are always built with convolutional layers. I have never seen residual networks with only fully connected layers. Does it work to build a residual network with only fully connected layers?
So, let's start with: what is the aim of ResNets?
Given an input X
, which is propagated through a certain ensemble of layers, let's call with F(X)
the output of this ensemble. If we denote with H(X)
the desired output (the ideal mapping, i.e. F(X)!=H(X)
), a resnet learn H(X) = F(X) + X
, that can be written as F(X) = H(X)-X
, i.e the residual, from which the name residual network.
Thus, what is the gain of a resnet?
In a resnet, the mapping of a following layer performs at least as well as the previous one. Why? Because, at lest, it learns the mapping of an identity (F(X)=X
).
This is a crucial aspect related to convolutional networks. Indeed, deeper nets should perform better than networks with lesser depth, but this does not always happen. From this rises the necessity to build a network that guarantees such behavior.
Is this true also for dense networks? No, it is not. There is a known theorem (Universal Approximation Theorem) for dense nets, which states: any kind of network is equivalent to a two dense layers net with an adequate number of hidden units distributed between the two layers. For this reason, it is not necessary to increase the depth of a dense net, rather it is necessary to find the right number of hidden units.
If you want you can explore the original paper by He et al 2015.
Yes, you can use residual networks in fully connected networks. Skipped connections help the learning for fully connected layers.
Here is a nice paper (not mine unfortunately) where it is done and where the authors explain in detail why it helps the learning. https://arxiv.org/pdf/1701.09175.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With