Logo Questions Linux Laravel Mysql Ubuntu Git Menu

How the number of parameters associated with BatchNormalization layer is 2048?

I have the following code.

x = keras.layers.Input(batch_shape = (None, 4096))
hidden = keras.layers.Dense(512, activation = 'relu')(x)
hidden = keras.layers.BatchNormalization()(hidden)
hidden = keras.layers.Dropout(0.5)(hidden)
predictions = keras.layers.Dense(80, activation = 'sigmoid')(hidden)
mlp_model = keras.models.Model(input = [x], output = [predictions])

And this is the model summary:

Layer (type)                     Output Shape          Param #     Connected to                     
input_3 (InputLayer)             (None, 4096)          0                                            
dense_1 (Dense)                  (None, 512)           2097664     input_3[0][0]                    
batchnormalization_1 (BatchNorma (None, 512)           2048        dense_1[0][0]                    
dropout_1 (Dropout)              (None, 512)           0           batchnormalization_1[0][0]       
dense_2 (Dense)                  (None, 80)            41040       dropout_1[0][0]                  
Total params: 2,140,752
Trainable params: 2,139,728
Non-trainable params: 1,024

The size of the input for the BatchNormalization (BN) layer is 512. According to Keras documentation, shape of the output for BN layer is same as input which is 512.

Then how the number of parameters associated with BN layer is 2048?

like image 920
Wasi Ahmad Avatar asked Mar 01 '17 00:03

Wasi Ahmad

People also ask

How do you determine the number of parameters in a batch normalization?

To do so, since you are in mode=0 by default, they compute 4 parameters per feature on the previous layer. Those parameters are making sure that you properly propagate and backpropagate the information. So 4*512 = 2048 , this should answer your question.

How many parameters should be learned in the batch normalization layer?

weights, bias) of any network layer, a Batch Norm layer also has parameters of its own: Two learnable parameters called beta and gamma.

What is the use of Leanable parameters in batch normalization layer?

These parameters are used for re-scaling (γ) and shifting(β) of the vector containing values from the previous operations. These two are learnable parameters, during the training neural network ensures the optimal values of γ and β are used. That will enable the accurate normalization of each batch.

How does batch normalization work?

Batch Norm is a normalization technique done between the layers of a Neural Network instead of in the raw data. It is done along mini-batches instead of the full data set. It serves to speed up training and use higher learning rates, making learning easier. the standard deviation of the neurons' output.

2 Answers

These 2048 parameters are in fact [gamma weights, beta weights, moving_mean(non-trainable), moving_variance(non-trainable)], each having 512 elements (the size of the input layer).

like image 190
Monaj Avatar answered Oct 07 '22 21:10


The batch normalization in Keras implements this paper.

As you can read there, in order to make the batch normalization work during training, they need to keep track of the distributions of each normalized dimensions. To do so, since you are in mode=0by default, they compute 4 parameters per feature on the previous layer. Those parameters are making sure that you properly propagate and backpropagate the information.

So 4*512 = 2048, this should answer your question.

like image 43
Nassim Ben Avatar answered Oct 07 '22 20:10

Nassim Ben