In the ResNet architecture, why is the ReLU activation applied after the element-wise addition with the residual in a residual block, instead of before it?
Because it was proposed this way. Residual Connections have been investigated in the following work: https://arxiv.org/pdf/1603.05027.pdf and they have found, that Skip -> BN -> RELU -> Conv -> BN -> RELU -> Conv -> Add works best.
However, the differences in performance are negligible and therefore the original ResNet formulation prevailed. Still, you can read the paper if you want to know what works and what does not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With