Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to stack LSTM layers using TensorFlow

what I have is the following, which I believe is a network with one hidden LSTM layer:

# Parameters
learning rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10

# Network Parameters
n_input = 13
n_steps = 10
n_hidden = 512
n_classes = 13

# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])

# Define weights
weights = {
    'out' : tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
    'out' : tf.Variable(tf.random_normal([n_classes]))
}

However, I am trying to build an LSTM network using TensorFlow to predict power consumption. I have been looking around to find a good example, but I could not find any model with 2 hidden LSTM layers. Here's the model that I would like to build:

1 input layer, 1 output layer, 2 hidden LSTM layers(with 512 neurons in each), time step(sequence length): 10

Could anyone guide me to build this using TensorFlow? ( from defining weights, building input shape, training, predicting, use of optimizer or cost function, etc), any help would be much appreciated.

Thank you so much in advance!

like image 995
subbie Avatar asked Aug 25 '16 06:08

subbie


People also ask

How do you stack multiple LSTM layers?

The solution is to add return_sequences=True to all LSTM layers except the last one so that its output tensor has ndim=3 (i.e. batch size, timesteps, hidden state). Setting this flag to true lets Keras know that LSTM output should contain all historical generated outputs along with time stamps (3D).

Should you stack LSTM layers?

"Stacking LSTM hidden layers makes the model deeper, more accurately earning the description as a deep learning technique ... The additional hidden layers are understood to recombine the learned representation from prior layers and create new representations at high levels of abstraction.

Can we add multiple LSTM layers?

We can continue to add hidden LSTM layers as long as the prior LSTM layer provides a 3D output as input for the subsequent layer; for example, below is a Stacked LSTM with 4 hidden layers.


2 Answers

Here is how I do it in a translation model with GRU cells. You can just replace the GRU with an LSTM. It is really easy just use tf.nn.rnn_cell.MultiRNNCell with a list of the multiple cells it should wrap. In the code bellow I am manually unrolling it but you can pass it to tf.nn.dynamic_rnn or tf.nn.rnn as well.

y = input_tensor
with tf.variable_scope('encoder') as scope:
    rnn_cell = rnn.MultiRNNCell([rnn.GRUCell(1024) for _ in range(3)])
    state = tf.zeros((BATCH_SIZE, rnn_cell.state_size))
    output = [None] * TIME_STEPS
    for t in reversed(range(TIME_STEPS)):
        y_t = tf.reshape(y[:, t, :], (BATCH_SIZE, -1))
        output[t], state = rnn_cell(y_t, state)
        scope.reuse_variables()
    y = tf.pack(output, 1)
like image 107
chasep255 Avatar answered Sep 24 '22 01:09

chasep255


First you need some placeholders to put your training data (one batch)

x_input = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])
y_output = tf.placeholder(tf.float32, [batch_size, truncated_series_length, 1])

A LSTM need a state, which consists of two components, the hidden state and the cell state, very good guide here: https://arxiv.org/pdf/1506.00019.pdf. For every layer in the LSTM you have one cell state and one hidden state.

The problem is that Tensorflow stores this in a LSTMStateTuple which you can not send into placeholder. So you need to store it in a Tensor, and then unpack it into a tuple:

state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])

l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
    [tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
     for idx in range(num_layers)]
)

Then you can use the built-in Tensorflow API to create the stacked LSTM layer.

cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input, initial_state=rnn_tuple_state)

From here you continue with the outputs to calculate logits and then a loss with respect to the y_inputs.

Then you run each batch with the sess.run-command, with truncated backpropagation (good explanation here http://r2rt.com/styles-of-truncated-backpropagation.html)

 init_state = np.zeros((num_layers, 2, batch_size, state_size))

...current_state... = sess.run([...state...], feed_dict={x_input:batch_in, state_placeholder:current_state ...})
current_state = np.array(current_state)

You will have to convert the state to a numpy array before feeding it again.

Perhaps it is better to use a librarly like Tflearn or Keras instead?

like image 31
user1506145 Avatar answered Sep 22 '22 01:09

user1506145