The official Tensorflow API doc claims that the parameter kernel_initializer
defaults to None
for tf.layers.conv2d
and tf.layers.dense
.
However, reading the layers tutorial (https://www.tensorflow.org/tutorials/layers), I noted that this parameter is not set in the code. For example:
# Convolutional Layer #1 conv1 = tf.layers.conv2d( inputs=input_layer, filters=32, kernel_size=[5, 5], padding="same", activation=tf.nn.relu)
The example code from the tutorial runs without any errors, so I think the default kernel_initializer
is not None
. So, which initializer is used?
In another code, I did not set the kernel_initializer
of the conv2d and dense layers, and everything was fine. However, when I tried to set the kernel_initializer
to tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32)
, I got NaN errors. What is going on here? Can anyone help?
From the documentation: If initializer is None (the default), the default initializer passed in the variable scope will be used. If that one is None too, a glorot_uniform_initializer will be used.
HeUniform class initializers. he_uniform . Draws samples from a uniform distribution within [-limit, limit] , where limit = sqrt(6 / fan_in) ( fan_in is the number of input units in the weight tensor).
Advertisements. Dense layer is the regular deeply connected neural network layer. It is most common and frequently used layer. Dense layer does the below operation on the input and return the output.
dense() is an inbuilt function of Tensorflow. js library. This function is used to create fully connected layers, in which every output depends on every input. Syntax: tf.layers.dense(args)
Great question! It is quite a trick to find out!
tf.layers.conv2d
variable_scope.get_variable
: In code:
self.kernel = vs.get_variable('kernel', shape=kernel_shape, initializer=self.kernel_initializer, regularizer=self.kernel_regularizer, trainable=True, dtype=self.dtype)
Next step: what does the variable scope do when the initializer is None?
Here it says:
If initializer is
None
(the default), the default initializer passed in the constructor is used. If that one isNone
too, we use a newglorot_uniform_initializer
.
So the answer is: it uses the glorot_uniform_initializer
For completeness the definition of this initializer:
The Glorot uniform initializer, also called Xavier uniform initializer. It draws samples from a uniform distribution within [-limit, limit] where
limit
issqrt(6 / (fan_in + fan_out))
wherefan_in
is the number of input units in the weight tensor andfan_out
is the number of output units in the weight tensor. Reference: http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
Edit: this is what I found in the code and documentation. Perhaps you could verify that the initialization looks like this by running eval on the weights!
According to this course by Andrew Ng and the Xavier documentation, if you are using ReLU as activation function, better change the default weights initializer(which is Xavier uniform) to Xavier normal by:
y = tf.layers.conv2d(x, kernel_initializer=tf.contrib.layers.xavier_initializer(uniform=False), )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With