How do I set weights of the batch normalization layer of Keras?
I am bit confused by the documentation
weights: Initialization weights. List of 2 Numpy arrays, with shapes: [(input_shape,), (input_shape,)] Note that the order of this list is [gamma, beta, mean, std]
Do we need all four [gamma, beta, mean, std]? Is there a way to set weights using only [gamma, beta]?
Using batch normalization allows us to use much higher learning rates, which further increases the speed at which networks train. Makes weights easier to initialize — Weight initialization can be difficult, and it's even more difficult when creating deeper networks.
Layer normalization is independent of the batch size, so it can be applied to batches with smaller sizes as well. Batch normalization requires different processing at training and inference times.
Just like the parameters (eg. weights, bias) of any network layer, a Batch Norm layer also has parameters of its own: Two learnable parameters called beta and gamma.
In practical coding, we add Batch Normalization after the activation function of the output layer or before the activation function of the input layer. Mostly researchers found good results in implementing Batch Normalization after the activation layer.
Yes, you need all four values. Recollect what batch normalization does. Its goal is to normalize (i.e. mean = 0 and standard deviation = 1) inputs coming into each layer. To this end, you need (mean, std)
. Thus a normalized activation can be viewed as an input to a sub-network which does a linear transformation:
y = gamma*x_norm + beta
(gamma, beta)
are very important since they complement (mean,std)
in the sense that (gamma, beta)
help get the original activations back from the normalized ones. If you don't do this or change any one parameter without considering the others, you risk changing the semantic meaning of the activations. These original activations can now be processed with your next layer. This process is repeated for all layers.
Edit:
On the other hand, I think it would be worth trying to first compute the mean and std on a large number of images and take input that as your mean and std. Take care that the images that you are computing mean and std on, come from the same distribution as your training data. I think this should work as batch normalization usually has two modes for computing mean, one is running average maintained over batches and the other is global mean (at least in Caffe, see here).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With