Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use He initialization in TensorFlow

He / MSRA initialization, from Delving Deep into Rectifiers, seems to be a recommended weight initialization when using ReLUs.

Is there a built-in way to use this in TensorFlow? (similar to: How to do Xavier initialization on TensorFlow)?

like image 450
matwilso Avatar asked Aug 14 '18 20:08

matwilso


People also ask

How do I use Xavier initializer in TensorFlow?

Xavier initialization is just sampling a (usually Gaussian) distribution where the variance is a function of the number of neurons. tf. random_normal can do that for you, you just need to compute the stddev (i.e. the number of neurons being represented by the weight matrix you're trying to initialize).

What is the default initialization in TensorFlow?

From the documentation: If initializer is None (the default), the default initializer passed in the variable scope will be used. If that one is None too, a glorot_uniform_initializer will be used. The glorot_uniform_initializer function initializes values from a uniform distribution.

What is kernel initializer in TensorFlow?

Initializers define the way to set the initial random weights of Keras layers. The keyword arguments used for passing initializers to layers depends on the layer. Usually, it is simply kernel_initializer and bias_initializer : from tensorflow.keras import layers from tensorflow.keras import initializers layer = layers.


1 Answers

TensorFlow 2.0

tf.keras.initializers.HeUniform()

or

tf.keras.initializers.HeNormal()

See docs for usage. (h/t to @mable)

TensorFlow 1.0

tf.contrib.layers.variance_scaling_initializer(dtype=tf.float32)

This will give you He / MRSA initialization. The documentation states that the default arguments for tf.contrib.layers.variance_scaling_initializer correspond to He initialization and that changing the arguments can yield Xavier initialization (this is what is done in TF's internal implementation for Xavier initialization).

Example usage:

W1 = tf.get_variable('W1', shape=[784, 256],
       initializer=tf.contrib.layers.variance_scaling_initializer())

or

initializer = tf.contrib.layers.variance_scaling_initializer()
W1 = tf.Variable(initializer([784,256]))
like image 193
matwilso Avatar answered Oct 27 '22 05:10

matwilso