I noticed that TensorFlow 1.0 contains two Xavier initalization helpers in contrib,
Both link to the same documentation page and have the same signature:
tf.contrib.layers.xavier_initializer(uniform=True, seed=None, dtype=tf.float32)
tf.contrib.layers.xavier_initializer_conv2d(uniform=True, seed=None, dtype=tf.float32)
however the difference between them is not explained at all. I can guess by the name that the _conv2d
version should be used for 2D convolutional layers, but would it have a noticeable impact if one were to use the regular version?
There is actually no difference. Because both the functions implement the weight initialization from:
Xavier Glorot and Yoshua Bengio (2010): Understanding the difficulty of training deep feedforward neural networks. International conference on artificial intelligence and statistics.
Both the initializer is designed to keep the scale of the gradients roughly the same in all layers. In uniform distribution this ends up being the range: x = sqrt(6. / (in + out)); [-x, x]
and for normal distribution a standard deviation of sqrt(3. / (in + out))
is used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With