What is the difference between He normal and Xavier normal initializer in keras. Both seem to initialize weights based on variance in the input data. Any intuitive explanation for the difference between both?
Xavier initialization is an attempt to improve the initialization of neural network weighted inputs, in order to avoid some traditional problems in machine learning. Here, the weights of the network are selected for certain intermediate values that have a benefit in machine learning application.
He_Normal initializer takes samples from a truncated normal distribution centered on 0 with stddev = sqrt(2 / fan_in) where fan_in is the number of input units in the weight tensor.
The goal of Xavier Initialization is to initialize the weights such that the variance of the activations are the same across every layer. This constant variance helps prevent the gradient from exploding or vanishing.
initializers. glorot_normal . Draws samples from a truncated normal distribution centered on 0 with stddev = sqrt(2 / (fan_in + fan_out)) where fan_in is the number of input units in the weight tensor and fan_out is the number of output units in the weight tensor.
See this discussion on Stats.SE:
In summary, the main difference for machine learning practitioners is the following:
- He initialization works better for layers with ReLu activation.
- Xavier initialization works better for layers with sigmoid activation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With