Why the number of parameters of the GRU layer is 9600?
Shouldn't it be ((16+32)*32 + 32) * 3 * 2 = 9,408 ?
or, rearranging,
32*(16 + 32 + 1)*3*2 = 9408
model = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim=4500, output_dim=16, input_length=200),
tf.keras.layers.Bidirectional(tf.keras.layers.GRU(32)),
tf.keras.layers.Dense(6, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()
The key is that tensorflow will separate biases for input and recurrent kernels when the parameter reset_after=True
in GRUCell
. You can look at some of the source code in GRUCell
as follow:
if self.use_bias:
if not self.reset_after:
bias_shape = (3 * self.units,)
else:
# separate biases for input and recurrent kernels
# Note: the shape is intentionally different from CuDNNGRU biases
# `(2 * 3 * self.units,)`, so that we can distinguish the classes
# when loading and converting saved weights.
bias_shape = (2, 3 * self.units)
Taking the reset gate as an example, we generally see the following formulas.
But if we set reset_after=True
, the actual formula is as follows:
As you can see, the default parameter of GRU
is reset_after=True
in tensorflow2
. But the default parameter of GRU
is reset_after=False
in tensorflow1.x
.
So the number of parameters of a GRU
layer should be ((16+32)*32 + 32 + 32) * 3 * 2 = 9600
in tensorflow2
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With