I'm looking at TensorFlow implementation of ORC on CIFAR-10, and I noticed that after the first convnet layer, they do pooling, then normalization, but after the second layer, they do normalization, then pooling.
I'm just wondering what would be the rationale behind this, and any tips on when/why we should choose to do norm before pool would be greatly appreciated. Thanks!
It should be pooling first, normalization second.
The original code link in the question no longer works, but I'm assuming the normalization being referred to is batch normalization. Though, the main idea will probably apply to other normalization as well. As noted by the batch normalization authors in the paper introducing batch normalization, one of the main purposes is "normalizing layer inputs". The simplified version of the idea being: if the inputs to each layer have a nice, reliable distribution of values, the network can train more easily. Putting the normalization second allows for this to happen.
As a concrete example, we can consider the activations [0, 99, 99, 100]
. To keep things simple, a 0-1 normalization will be used. A max pooling with kernel 2 will be used. If the values are first normalized, we get [0, 0.99, 0.99, 1]
. Then pooling gives [0.99, 1]
. This does not provide the nice distribution of inputs to the next layer. If we instead pool first, we get [99, 100]
. Then normalizing gives [0, 1]
. Which means we can then control the distribution of the inputs to the next layer to be what we want them to be to best promote training.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With