Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where to apply batch normalization on standard CNNs

I have the following architecture:

Conv1
Relu1
Pooling1
Conv2
Relu2
Pooling3
FullyConnect1
FullyConnect2

My question is, where do I apply batch normalization? And what would be the best function to do this in TensorFlow?

like image 721
whoisraibolt Avatar asked Nov 06 '17 18:11

whoisraibolt


People also ask

Where should I put Batch Normalization?

In practical coding, we add Batch Normalization after the activation function of the output layer or before the activation function of the input layer. Mostly researchers found good results in implementing Batch Normalization after the activation layer.

Where should I put Batch Normalization in CNN?

It is often placed just after defining the sequential model and after the convolution and pooling layers. The below code shows how to define the BatchNormalization layer for the classification of handwritten digits.

When should a Batch Normalization be done?

Batch normalization is a technique for training very deep neural networks that normalizes the contributions to a layer for every mini-batch. This has the impact of settling the learning process and drastically decreasing the number of training epochs required to train deep neural networks.

In what situation is Batch Normalization preferred over layer normalization?

Batch Normalization depends on mini-batch size and may not work properly for smaller batch sizes. On the other hand, Layer normalization does not depend on mini-batch size. In batch normalization, input values of the same neuron for all the data in the mini-batch are normalized.


1 Answers

There's some debate on this question. This Stack Overflow thread and this keras thread are examples of the debate. Andrew Ng says that batch normalization should be applied immediately before the non-linearity of the current layer. The authors of the BN paper said that as well, but now according to François Chollet on the keras thread, the BN paper authors use BN after the activation layer. On the other hand, there are some benchmarks such as the one discussed on this torch-residual-networks github issue that show BN performing better after the activation layers.

My current opinion (open to being corrected) is that you should do BN after the activation layer, and if you have the budget for it and are trying to squeeze out extra accuracy, try before the activation layer.

So adding Batch Normalization to your CNN would look like this:

Conv1
Relu1
BatchNormalization
Pooling1
Conv2
Relu2
BatchNormalization
Pooling3
FullyConnect1
BatchNormalization
FullyConnect2
BatchNormalization
like image 169
skeller88 Avatar answered Sep 17 '22 12:09

skeller88