Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot stack LSTM with MultiRNNCell and dynamic_rnn

I am trying to build a multivariate time series prediction model. I followed the following tutorial for temperature prediction. http://nbviewer.jupyter.org/github/addfor/tutorials/blob/master/machine_learning/ml16v04_forecasting_with_LSTM.ipynb

I want to extend his model to multilayer LSTM model by using following code:

cell = tf.contrib.rnn.LSTMCell(hidden, state_is_tuple=True)  
cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers,state_is_tuple=True)  
output, _ = tf.nn.dynamic_rnn(cell=cell, inputs=features, dtype=tf.float32)

but I have an error saying:

ValueError: Dimensions must be equal, but are 256 and 142 for 'rnn/while/rnn/multi_rnn_cell/cell_0/cell_0/lstm_cell/MatMul_1' (op: 'MatMul') with input shapes: [?,256], [142,512].

When I tried this:

cell = []
for i in range(num_layers):
    cell.append(tf.contrib.rnn.LSTMCell(hidden, state_is_tuple=True))
cell = tf.contrib.rnn.MultiRNNCell(cell,state_is_tuple=True)
output, _ = tf.nn.dynamic_rnn(cell=cell, inputs=features, dtype=tf.float32)

I do not have such error but the prediction is really bad.

I define hidden=128.

features = tf.reshape(features, [-1, n_steps, n_input]) has shape (?,1,14) for single layer case.

my data look like this x.shape=(594,14), y.shape=(591,1)

I am so confused how to stack LSTM cell in tensorflow. My tensorflow version is 0.14.

like image 494
zdarktknight Avatar asked Nov 18 '17 22:11

zdarktknight


People also ask

What is the difference between RNN and LSTM?

Similar to the issue with RNN, the implementation of LSTM is little different then what is proposed in most articles. The main difference is, instead of concatenating the input and previous hidden state, we have different weight matrices which are applied to the both before passing them to 4 internal neural networks in the LSTM cell.

What is stacking in LSTM?

When you build layers of LSTM where output of one layer (which is h l 1, l =..., t − 1, t, t + 1...) becomes input of others, it is called stacking. In stacked LSTMs, each LSTM layer outputs a sequence of vectors which will be used as an input to a subsequent LSTM layer.

How to train RNN with multiple batches of data?

Usually, after running on each sample in a batch, the state of the RNN cell is reset. But if we have prepared the data in a format such that across multiple batches, samples at a particular index are just an extension of the same sentence, we can turn stateful as "True" and it will equivalent to training all sentences at once (as one sample).

How does the Keras RNN implementation work?

Contrary to the suggested architecture in many articles, the Keras implementation is quite different but simple. Each RNN cell takes one data input and one hidden state which is passed from a one-time step to the next. The RNN cell looks as follows, The flow of data and hidden state inside the RNN cell implementation in Keras. Image by Author.


1 Answers

This is a very interesting question. Initially, I thought that two codes produce the same output (i.e stacking two LSTM cells).

code 1

cell = tf.contrib.rnn.LSTMCell(hidden, state_is_tuple=True)  
cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers,state_is_tuple=True)
print(cell) 

code 2

cell = []
for i in range(num_layers):
    cell.append(tf.contrib.rnn.LSTMCell(hidden, state_is_tuple=True))
cell = tf.contrib.rnn.MultiRNNCell(cell,state_is_tuple=True)
print(cell) 

However, If you print the cell in both instances produce something like following,

code 1

[<tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell object at 0x000000000D7084E0>, <tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell object at 0x000000000D7084E0>]

code 2

[<tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell object at 0x000000000D7084E0>, <tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell object at 0x000000000D708B00>]

If you closely observe the results,

  • For code 1, prints a list of two LSTM cell objects and one object is the copy of other (since the pointers of the two objects are same)
  • For code 2 prints a list of two different LSTM cell objects (since the pointers of two objects are different).

Stacking two LSTM cells is something like below,

enter image description here

Therefore, If you think about the big picture (actual Tensorflow operation may be different), what it does is,

  1. First map inputs to LSTM cell 1 hidden units (in your case 14 to 128).
  2. Second, map hidden units of LSTM cell 1 to hidden units of LSTM cell 2 (in your case 128 to 128) .

Therefore, when you trying to do the above two operations to the same copy of LSTM cell (since the dimensions of weight matrices are different), there is an error.

However, if you use the number of hidden units as same the number input units (in your case input is 14 and hidden is 14) there is no error (since the dimensions of weight matrices are the same) although you are using the same LSTM cell.

Therefore, I think your second approach is correct if you are thinking of stacking two LSTM cells.

like image 64
Nipun Wijerathne Avatar answered Nov 28 '22 18:11

Nipun Wijerathne