The official Tensorflow API doc claims that the parameter <code>kernel_initializer</code> defaults to <code>None</code> for <code>tf.layers.conv2d</code> and <code>tf.layers.dense</code>. However, reading the layers tutorial (https://www.tensorflow.org/tutorials/layers), I noted that this parameter is not set in the code. For example: <pre class="prettyprint lang-py prettyprint-override"><code># Convolutional Layer #1 conv1 = tf.layers.conv2d( inputs=input_layer, filters=32, kernel_size=[5, 5], padding="same", activation=tf.nn.relu) </code></pre> The example code from the tutorial runs without any errors, so I think the default <code>kernel_initializer</code> is not <code>None</code>. So, which initializer is used? In another code, I did not set the <code>kernel_initializer</code> of the conv2d and dense layers, and everything was fine. However, when I tried to set the <code>kernel_initializer</code> to <code>tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32)</code>, I got NaN errors. What is going on here? Can anyone help?

Great question! It is quite a trick to find out! <ul> <li>As you can see, it is not documented in <code>tf.layers.conv2d</code> </li> <li>If you look at the definition of the function you see that the function calls <code>variable_scope.get_variable</code>: </li> </ul> In code: <pre class="prettyprint lang-py prettyprint-override"><code>self.kernel = vs.get_variable('kernel', shape=kernel_shape, initializer=self.kernel_initializer, regularizer=self.kernel_regularizer, trainable=True, dtype=self.dtype) </code></pre> Next step: what does the variable scope do when the initializer is None? Here it says: <blockquote> If initializer is <code>None</code> (the default), the default initializer passed in the constructor is used. If that one is <code>None</code> too, we use a new <code>glorot_uniform_initializer</code>. </blockquote> So the answer is: it uses the <code>glorot_uniform_initializer</code> For completeness the definition of this initializer: <blockquote> The Glorot uniform initializer, also called Xavier uniform initializer. It draws samples from a uniform distribution within [-limit, limit] where <code>limit</code> is <code>sqrt(6 / (fan_in + fan_out))</code> where <code>fan_in</code> is the number of input units in the weight tensor and <code>fan_out</code> is the number of output units in the weight tensor. Reference: http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf </blockquote> Edit: this is what I found in the code and documentation. Perhaps you could verify that the initialization looks like this by running eval on the weights!

According to this course by Andrew Ng and the Xavier documentation, if you are using ReLU as activation function, better change the default weights initializer(which is Xavier uniform) to Xavier normal by: <pre class="prettyprint"><code>y = tf.layers.conv2d(x, kernel_initializer=tf.contrib.layers.xavier_initializer(uniform=False), ) </code></pre>

What is the default kernel initializer in tf.layers.conv2d and tf.layers.dense?

Tags:

tensorflow

The official Tensorflow API doc claims that the parameter kernel_initializer defaults to None for tf.layers.conv2d and tf.layers.dense.

However, reading the layers tutorial (https://www.tensorflow.org/tutorials/layers), I noted that this parameter is not set in the code. For example:

# Convolutional Layer #1 conv1 = tf.layers.conv2d(     inputs=input_layer,     filters=32,     kernel_size=[5, 5],     padding="same",     activation=tf.nn.relu)

The example code from the tutorial runs without any errors, so I think the default kernel_initializer is not None. So, which initializer is used?

In another code, I did not set the kernel_initializer of the conv2d and dense layers, and everything was fine. However, when I tried to set the kernel_initializer to tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32), I got NaN errors. What is going on here? Can anyone help?

395

asked Apr 07 '17 17:04

daniszw

2 Answers

Great question! It is quite a trick to find out!

As you can see, it is not documented in tf.layers.conv2d
If you look at the definition of the function you see that the function calls variable_scope.get_variable:

In code:

self.kernel = vs.get_variable('kernel',                                   shape=kernel_shape,                                   initializer=self.kernel_initializer,                                   regularizer=self.kernel_regularizer,                                   trainable=True,                                   dtype=self.dtype)

Next step: what does the variable scope do when the initializer is None?

Here it says:

If initializer is None (the default), the default initializer passed in the constructor is used. If that one is None too, we use a new glorot_uniform_initializer.

So the answer is: it uses the glorot_uniform_initializer

For completeness the definition of this initializer:

The Glorot uniform initializer, also called Xavier uniform initializer. It draws samples from a uniform distribution within [-limit, limit] where limit is sqrt(6 / (fan_in + fan_out)) where fan_in is the number of input units in the weight tensor and fan_out is the number of output units in the weight tensor. Reference: http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf

Edit: this is what I found in the code and documentation. Perhaps you could verify that the initialization looks like this by running eval on the weights!

111

answered Sep 21 '22 16:09

rmeertens

According to this course by Andrew Ng and the Xavier documentation, if you are using ReLU as activation function, better change the default weights initializer(which is Xavier uniform) to Xavier normal by:

y = tf.layers.conv2d(x, kernel_initializer=tf.contrib.layers.xavier_initializer(uniform=False), )

answered Sep 21 '22 16:09

xtluo

Related questions
                            
                                Tensorflow NaN bug?
                            
                                In Keras, what exactly am I configuring when I create a stateful `LSTM` layer with N `units`?
                            
                                How to "reset" tensorboard data after killing tensorflow instance
                            
                                How to inspect a Tensorflow .tfrecord file?
                            
                                Remove nodes from graph or reset entire default graph
                            
                                AttributeError: 'Tensor' object has no attribute 'numpy'
                            
                                In TensorFlow, what is tf.identity used for?
                            
                                Tensorflow One Hot Encoder?
                            
                                Get the value of some weights in a model trained by TensorFlow
                            
                                How to export Keras .h5 to tensorflow .pb?
                            
                                Installing Python3.6 alongside Python3.7 on Mac
                            
                                What is the difference between variable_scope and name_scope? [duplicate]
                            
                                What's the difference between tf.Session() and tf.InteractiveSession()?
                            
                                How do I disable TensorFlow's eager execution?
                            
                                FailedPreconditionError: Attempting to use uninitialized in Tensorflow
                            
                                Gradient Descent vs Adagrad vs Momentum in TensorFlow
                            
                                TensorFlow: InternalError: Blas SGEMM launch failed
                            
                                What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?
                            
                                What is the purpose of the Tensorflow Gradient Tape?
                            
                                Can I measure the execution time of individual operations with TensorFlow?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With