I'm currently trying to build a simple model for predicting time series. The goal would be to train the model with a sequence so that the model is able to predict future values. I'm using tensorflow and lstm cells to do so. The model is trained with truncated backpropagation through time. My question is how to structure the data for training. For example let's assume we want to learn the given sequence: <pre class="prettyprint"><code>[1,2,3,4,5,6,7,8,9,10,11,...] </code></pre> And we unroll the network for <code>num_steps=4</code>. Option 1 <pre class="prettyprint"><code>input data label 1,2,3,4 2,3,4,5 5,6,7,8 6,7,8,9 9,10,11,12 10,11,12,13 ... </code></pre> Option 2 <pre class="prettyprint"><code>input data label 1,2,3,4 2,3,4,5 2,3,4,5 3,4,5,6 3,4,5,6 4,5,6,7 ... </code></pre> Option 3 <pre class="prettyprint"><code>input data label 1,2,3,4 5 2,3,4,5 6 3,4,5,6 7 ... </code></pre> Option 4 <pre class="prettyprint"><code>input data label 1,2,3,4 5 5,6,7,8 9 9,10,11,12 13 ... </code></pre> Any help would be appreciated.

After reading several LSTM introduction blogs e.g. Jakob Aungiers', option 3 seems to be the right one for stateless LSTM. If your LSTMs need to remember data longer ago than your <code>num_steps</code>, your can train in a stateful way - for a Keras example see Philippe Remy's blog post "Stateful LSTM in Keras". Philippe does not show an example for batch size greater than one, however. I guess that in your case a batch size of four with stateful LSTM could be used with the following data (written as <code>input -> label</code>): <pre class="prettyprint"><code>batch #0: 1,2,3,4 -> 5 2,3,4,5 -> 6 3,4,5,6 -> 7 4,5,6,7 -> 8 batch #1: 5,6,7,8 -> 9 6,7,8,9 -> 10 7,8,9,10 -> 11 8,9,10,11 -> 12 batch #2: 9,10,11,12 -> 13 ... </code></pre> By this, the state of e.g. the 2nd sample in batch #0 is correctly reused to continue training with the 2nd sample of batch #1. This is somehow similar to your option 4, however you are not using all available labels there. Update: In extension to my suggestion where <code>batch_size</code> equals the <code>num_steps</code>, Alexis Huet gives an answer for the case of <code>batch_size</code> being a divisor of <code>num_steps</code>, which can be used for larger <code>num_steps</code>. He describes it nicely on his blog.

How to train a RNN with LSTM cells for time series prediction

Tags:

I'm currently trying to build a simple model for predicting time series. The goal would be to train the model with a sequence so that the model is able to predict future values.

I'm using tensorflow and lstm cells to do so. The model is trained with truncated backpropagation through time. My question is how to structure the data for training.

For example let's assume we want to learn the given sequence:

[1,2,3,4,5,6,7,8,9,10,11,...]

And we unroll the network for num_steps=4.

Option 1

input data               label     
1,2,3,4                  2,3,4,5
5,6,7,8                  6,7,8,9
9,10,11,12               10,11,12,13
...

Option 2

input data               label     
1,2,3,4                  2,3,4,5
2,3,4,5                  3,4,5,6
3,4,5,6                  4,5,6,7
...

Option 3

input data               label     
1,2,3,4                  5
2,3,4,5                  6
3,4,5,6                  7
...

Option 4

input data               label     
1,2,3,4                  5
5,6,7,8                  9
9,10,11,12               13
...

Any help would be appreciated.

673

asked Mar 12 '16 17:03

Jakob

2 Answers

I'm just about to learn LSTMs in TensorFlow and try to implement an example which (luckily) tries to predict some time-series / number-series genereated by a simple math-fuction.

But I'm using a different way to structure the data for training, motivated by Unsupervised Learning of Video Representations using LSTMs:

LSTM Future Predictor Model

Option 5:

input data               label     
1,2,3,4                  5,6,7,8
2,3,4,5                  6,7,8,9
3,4,5,6                  7,8,9,10
...

Beside this paper, I (tried) to take inspiration by the given TensorFlow RNN examples. My current complete solution looks like this:

import math
import random
import numpy as np
import tensorflow as tf

LSTM_SIZE = 64
LSTM_LAYERS = 2
BATCH_SIZE = 16
NUM_T_STEPS = 4
MAX_STEPS = 1000
LAMBDA_REG = 5e-4


def ground_truth_func(i, j, t):
    return i * math.pow(t, 2) + j


def get_batch(batch_size):
    seq = np.zeros([batch_size, NUM_T_STEPS, 1], dtype=np.float32)
    tgt = np.zeros([batch_size, NUM_T_STEPS], dtype=np.float32)

    for b in xrange(batch_size):
        i = float(random.randint(-25, 25))
        j = float(random.randint(-100, 100))
        for t in xrange(NUM_T_STEPS):
            value = ground_truth_func(i, j, t)
            seq[b, t, 0] = value

        for t in xrange(NUM_T_STEPS):
            tgt[b, t] = ground_truth_func(i, j, t + NUM_T_STEPS)
    return seq, tgt


# Placeholder for the inputs in a given iteration
sequence = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS, 1])
target = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS])

fc1_weight = tf.get_variable('w1', [LSTM_SIZE, 1], initializer=tf.random_normal_initializer(mean=0.0, stddev=1.0))
fc1_bias = tf.get_variable('b1', [1], initializer=tf.constant_initializer(0.1))

# ENCODER
with tf.variable_scope('ENC_LSTM'):
    lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
    multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
    initial_state = multi_lstm.zero_state(BATCH_SIZE, tf.float32)
    state = initial_state
    for t_step in xrange(NUM_T_STEPS):
        if t_step > 0:
            tf.get_variable_scope().reuse_variables()

        # state value is updated after processing each batch of sequences
        output, state = multi_lstm(sequence[:, t_step, :], state)

learned_representation = state

# DECODER
with tf.variable_scope('DEC_LSTM'):
    lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
    multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
    state = learned_representation
    logits_stacked = None
    loss = 0.0
    for t_step in xrange(NUM_T_STEPS):
        if t_step > 0:
            tf.get_variable_scope().reuse_variables()

        # state value is updated after processing each batch of sequences
        output, state = multi_lstm(sequence[:, t_step, :], state)
        # output can be used to make next number prediction
        logits = tf.matmul(output, fc1_weight) + fc1_bias

        if logits_stacked is None:
            logits_stacked = logits
        else:
            logits_stacked = tf.concat(1, [logits_stacked, logits])

        loss += tf.reduce_sum(tf.square(logits - target[:, t_step])) / BATCH_SIZE

reg_loss = loss + LAMBDA_REG * (tf.nn.l2_loss(fc1_weight) + tf.nn.l2_loss(fc1_bias))

train = tf.train.AdamOptimizer().minimize(reg_loss)

with tf.Session() as sess:
    sess.run(tf.initialize_all_variables())

    total_loss = 0.0
    for step in xrange(MAX_STEPS):
        seq_batch, target_batch = get_batch(BATCH_SIZE)

        feed = {sequence: seq_batch, target: target_batch}
        _, current_loss = sess.run([train, reg_loss], feed)
        if step % 10 == 0:
            print("@{}: {}".format(step, current_loss))
        total_loss += current_loss

    print('Total loss:', total_loss)

    print('### SIMPLE EVAL: ###')
    seq_batch, target_batch = get_batch(BATCH_SIZE)
    feed = {sequence: seq_batch, target: target_batch}
    prediction = sess.run([logits_stacked], feed)
    for b in xrange(BATCH_SIZE):
        print("{} -> {})".format(str(seq_batch[b, :, 0]), target_batch[b, :]))
        print(" `-> Prediction: {}".format(prediction[0][b]))

Sample output of this looks like this:

### SIMPLE EVAL: ###
# [input seq] -> [target prediction]
#  `-> Prediction: [model prediction]  
[  33.   53.  113.  213.] -> [  353.   533.   753.  1013.])
 `-> Prediction: [ 19.74548721  28.3149128   33.11489105  35.06603241]
[ -17.  -32.  -77. -152.] -> [-257. -392. -557. -752.])
 `-> Prediction: [-16.38951683 -24.3657589  -29.49801064 -31.58583832]
[ -7.  -4.   5.  20.] -> [  41.   68.  101.  140.])
 `-> Prediction: [ 14.14126873  22.74848557  31.29668617  36.73633194]
...

The model is a LSTM-autoencoder having 2 layers each.

Unfortunately, as you can see in the results, this model does not learn the sequence properly. I might be the case that I'm just doing a bad mistake somewhere, or that 1000-10000 training steps is just way to few for a LSTM. As I said, I'm also just starting to understand/use LSTMs properly. But hopefully this can give you some inspiration regarding the implementation.

121

answered Oct 14 '22 09:10

b3nk4n

After reading several LSTM introduction blogs e.g. Jakob Aungiers', option 3 seems to be the right one for stateless LSTM.

If your LSTMs need to remember data longer ago than your num_steps, your can train in a stateful way - for a Keras example see Philippe Remy's blog post "Stateful LSTM in Keras". Philippe does not show an example for batch size greater than one, however. I guess that in your case a batch size of four with stateful LSTM could be used with the following data (written as input -> label):

batch #0:
1,2,3,4 -> 5
2,3,4,5 -> 6
3,4,5,6 -> 7
4,5,6,7 -> 8

batch #1:
5,6,7,8 -> 9
6,7,8,9 -> 10
7,8,9,10 -> 11
8,9,10,11 -> 12

batch #2:
9,10,11,12 -> 13
...

By this, the state of e.g. the 2nd sample in batch #0 is correctly reused to continue training with the 2nd sample of batch #1.

This is somehow similar to your option 4, however you are not using all available labels there.

Update:

In extension to my suggestion where batch_size equals the num_steps, Alexis Huet gives an answer for the case of batch_size being a divisor of num_steps, which can be used for larger num_steps. He describes it nicely on his blog.

answered Oct 14 '22 09:10

Robert Pollak

Related questions
                            
                                Requirejs vs browserify vs webpack for js loading order: am I just moving the situation from one side to another?
                            
                                Prevent Google play from auto translating my app description
                            
                                CSharpAddImportCodeFixProvider encountered an error and has been disabled
                            
                                How to programmatically get the list of registered users in Firebase [duplicate]
                            
                                Advantages & Disadvantages of Brokered vs non-brokered Messaging Systems
                            
                                CSS Font-Family Support Dropped for <SELECT> in Firefox?
                            
                                Windows 10 Docker Host - Display GUI application from Linux Container
                            
                                Is Google Sign-In a free or paid service? [closed]
                            
                                Return transition not working correctly when using fragment shared transitions
                            
                                Python introspection: get the argument list of a method_descriptor?
                            
                                Interpreting the sum of TF-IDF scores of words across documents
                            
                                vs code and intellisense for CSS Grid and CSS Modules

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With