I want to code up one time step in a LSTM. My focus is on understanding the functioning of the forget gate layer, input gate layer, candidate values, present and future cell states.
Lets assume that my hidden state at t-1 and xt are the following. For simplicity, lets assume that the weight matrices are identity matrices, and all biases are zero.
htminus1 = np.array( [0, 0.5, 0.1, 0.2, 0.6] )
xt = np.array( [-0.1, 0.3, 0.1, -0.25, 0.1] )
I understand that forget state is sigmoid of htminus1 and xt
So, is it?
ft = 1 / ( 1 + np.exp( -( htminus1 + xt ) ) )
>> ft = array([0.47502081, 0.68997448, 0.549834 , 0.4875026 , 0.66818777])
I am referring to this link to implement of one iteration of one block LSTM. The link says that ft should be 0 or 1. Am I missing something here?
How do I get the forget gate layer as per schema given in the below mentioned picture? An example will be illustrative for me.

Along the same lines, how do I get the input gate layer, it and vector of new candidate values, \tilde{C}_t as per the following picture?

Finally, how do I get the new hidden state ht as per the scheme given in the following picture?
A simple, example will be helpful for me in understanding. Thanks in advance.

So this is not obvious from the figures, but here is how it works -
If you see two lines joining to form a single line, it's a concatenation operation. You have interpreted it as an addition.
Wherever you see sigmoid or tanh blocks, a multiplication with a trainable weight matrix is implied.
If two lines are joined by an explicit x or +, you are doing element wise multiplication and addition respectively.
So instead of sigmoid(htminus1+xt), which is what you have, the correct operation would be sigmoid(Wf * np.concatenate(htminus1+xt)) + bf. Wf is the matrix of trainable parameters and bf is the corresponding bias terms.
Note that I have just written the equations on the right side of the images in numpy, not much else. Interpret [a, b] as the concetenation operations between a and b.
You can define the other operations similarly.
ft = sigmoid(Wf * np.concatenate(htminus1, xt)) + bf
it = sigmoid(Wi * np.concatenate(htminus1, xt)) + bi
Ctt = tanh(Wc * np.concatenate(htminus1, xt)) + bc
Ot = sigmoid(Wo * np.concatenate(htminus1, xt)) + bo
Ct = (C_{t-1} * ft) + (Ctt * it)
ht = Ot * tanh(Ct)
Note: I have represented C^{tilda} as Ctt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With