Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use Layer Normalization with CNN?

I see the Layer Normalization is the modern normalization method than Batch Normalization, and it is very simple to coding in Tensorflow. But I think the layer normalization is designed for RNN, and the batch normalization for CNN. Can I use the layer normalization with CNN that process image classification task? What are the criteria for choosing batch normalization or layer?

like image 563
Apollo Avatar asked Jul 06 '17 06:07

Apollo


People also ask

When should I use layer normalization?

In conclusion, Normalization layers in the model often helps to speed up and stabilize the learning process. If training with large batches isn't an issue and if the network doesn't have any recurrent connections, Batch Normalization could be used.

Should I use batch normalization or layer normalization?

As batch normalization is dependent on batch size, it's not effective for small batch sizes. Layer normalization is independent of the batch size, so it can be applied to batches with smaller sizes as well. Batch normalization requires different processing at training and inference times.

Why is layer normalization better in RNN?

Unlike batch normalization, Layer Normalization directly estimates the normalization statistics from the summed inputs to the neurons within a hidden layer so the normalization does not introduce any new dependencies between training cases.

Where do you put the normalization layer?

Normalization layers usually apply their normalization effect to the previous layer, so it should be put in front of the layer that you want normalized.


1 Answers

You can use Layer normalisation in CNNs, but i don't think it more 'modern' than Batch Norm. They both normalise differently. Layer norm normalises all the activations of a single layer from a batch by collecting statistics from every unit within the layer, while batch norm normalises the whole batch for every single activation, where the statistics is collected for every single unit across the batch.

Batch norm is generally preferred over layer norm as it tries to normalise every activation to a unit gaussian distribution, while layer norm tries to get the 'average' of all activations to unit gaussian. But if the batch size is too small to collect reasonable statistics, then layer norm is preferred.

like image 90
vijay m Avatar answered Sep 19 '22 23:09

vijay m