I want to understand better those shape of the Tensorflow´s BasicLSTMCell Kernel and Bias.
@tf_export("nn.rnn_cell.BasicLSTMCell")
class BasicLSTMCell(LayerRNNCell):
input_depth = inputs_shape[1].value
h_depth = self._num_units
self._kernel = self.add_variable(
_WEIGHTS_VARIABLE_NAME,
shape=[input_depth + h_depth, 4 * self._num_units])
self._bias = self.add_variable(
_BIAS_VARIABLE_NAME,
shape=[4 * self._num_units],
initializer=init_ops.zeros_initializer(dtype=self.dtype))
Why does the kernel have the shape=[input_depth + h_depth, 4 * self._num_units]) and the bias the shape = [4 * self._num_units] ? Maybe the factor 4 come from the forget gate, block input, input gate and output gate? And what´s the reason for the summation of input_depth and h_depth?
More information about my LSTM Network:
num_input = 12, timesteps = 820, num_hidden = 64, num_classes = 2.
With tf.trainables_variables() i get the following information:
The following Code defines my LSTM Network.
def RNN(x, weights, biases):
x = tf.unstack(x, timesteps, 1)
lstm_cell = rnn.BasicLSTMCell(num_hidden)
outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
return tf.matmul(outputs[-1], weights['out']) + biases['out']
First, about summing input_depth
and h_depth
: RNNs generally follow equations like h_t = W*h_t-1 + V*x_t
to compute the state h
at time t
. That is, we apply a matrix multiplication to the last state and the current input and add the two. This is actually equivalent to concatenating h_t-1
and x_t
(let's just call this c
), "stacking" the two matrices W
and V
(let's just call this S
) and computing S*c
.
Now we only have one matrix multiplication instead of two; I believe this can be parallelized more effectively so this is done for performance reasons. Since h_t-1
has size h_depth
and x
has size input_depth
we need to add the two dimensionalities for the concatenated vector c
.
Second, you are right about the factor 4 coming from the gates. This is essentially the same as above: Instead of carrying out four separate matrix multiplications for the input and each of the gates, we carry out one multiplication that results in a big vector that is the input and all four gate values concatenated. Then we can just split this vector into four parts. In the LSTM cell source code this happens in lines 627-633.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With