Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set weights of the batch normalization layer?

Tags:

How do I set weights of the batch normalization layer of Keras?

I am bit confused by the documentation

weights: Initialization weights. List of 2 Numpy arrays, with shapes: [(input_shape,), (input_shape,)] Note that the order of this list is [gamma, beta, mean, std]

Do we need all four [gamma, beta, mean, std]? Is there a way to set weights using only [gamma, beta]?

like image 354
Prasanna Avatar asked Mar 14 '17 18:03

Prasanna


People also ask

Does batch normalization initialize weights?

Using batch normalization allows us to use much higher learning rates, which further increases the speed at which networks train. Makes weights easier to initialize — Weight initialization can be difficult, and it's even more difficult when creating deeper networks.

Does layer normalization depend on batch size?

Layer normalization is independent of the batch size, so it can be applied to batches with smaller sizes as well. Batch normalization requires different processing at training and inference times.

How many parameters does a batch normalization layer?

Just like the parameters (eg. weights, bias) of any network layer, a Batch Norm layer also has parameters of its own: Two learnable parameters called beta and gamma.

Where should I put a batch normalization layer?

In practical coding, we add Batch Normalization after the activation function of the output layer or before the activation function of the input layer. Mostly researchers found good results in implementing Batch Normalization after the activation layer.


1 Answers

Yes, you need all four values. Recollect what batch normalization does. Its goal is to normalize (i.e. mean = 0 and standard deviation = 1) inputs coming into each layer. To this end, you need (mean, std). Thus a normalized activation can be viewed as an input to a sub-network which does a linear transformation:

y = gamma*x_norm + beta

(gamma, beta) are very important since they complement (mean,std) in the sense that (gamma, beta) help get the original activations back from the normalized ones. If you don't do this or change any one parameter without considering the others, you risk changing the semantic meaning of the activations. These original activations can now be processed with your next layer. This process is repeated for all layers.

Edit:

On the other hand, I think it would be worth trying to first compute the mean and std on a large number of images and take input that as your mean and std. Take care that the images that you are computing mean and std on, come from the same distribution as your training data. I think this should work as batch normalization usually has two modes for computing mean, one is running average maintained over batches and the other is global mean (at least in Caffe, see here).

like image 104
Autonomous Avatar answered Sep 23 '22 10:09

Autonomous