Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to configure a very simple LSTM with Keras / Theano for Regression

I am struggling to configure a Keras LSTM for a simple regression task. There is some very basic explanation at the official page: Keras RNN documentation

But to fully understand, example configurations with example data would be extremely helpful.

I have barely found examples for regression with Keras-LSTM. Most examples are about classification (text or images). I've studied the LSTM examples which come with the Keras distribution and one example I found through Google search: http://danielhnyk.cz/ It offers some insight, though the author admitts the approach is quite memory-inefficient, since data samples have to be stored very redundantly.

Although, an improvement was introduced by a commentor (Taha), data-storage is still redundant, I doubt this is the way it was meant to be by the Keras developers.

I've downloaded some simple example sequential data, which happens to be stock data from Yahoo finance. It is freely available from Yahoo Finance Data

Date,       Open,      High,      Low,       Close,     Volume,   Adj Close
2016-05-18, 94.160004, 95.209999, 93.889999, 94.559998, 41923100, 94.559998
2016-05-17, 94.550003, 94.699997, 93.010002, 93.489998, 46507400, 93.489998
2016-05-16, 92.389999, 94.389999, 91.650002, 93.879997, 61140600, 93.879997
2016-05-13, 90.00,     91.669998, 90.00,     90.519997, 44188200, 90.519997

The table consists of more than 8900 such lines of Apple stock data. There are 7 columns = data points for each day. The value to predict would be "AdjClose", which is the value at the end of the day

So the goal would be to predict the AdjClose for the next day, based on the sequence of a the previous few days. (This is probably next to impossible, but it is always good to see how a tool behaves under challenging conditions.)

I think this should be a very standard prediction/regression case for LSTM and easily transferrable to other problem domains.

So, how should the data be formatted (X_train, y_train) for minimum redundancy and how do I initialize the Sequential model with only one LSTM layer and a couple of hidden neurons?

Kind Regards, Theo

PS: I started coding this:

...
X_train
Out[6]: 
array([[  2.87500000e+01,   2.88750000e+01,   2.87500000e+01,
      2.87500000e+01,   1.17258400e+08,   4.31358010e-01],
   [  2.73750019e+01,   2.73750019e+01,   2.72500000e+01,
      2.72500000e+01,   4.39712000e+07,   4.08852011e-01],
   [  2.53750000e+01,   2.53750000e+01,   2.52500000e+01,
      2.52500000e+01,   2.64320000e+07,   3.78845006e-01],
   ..., 
   [  9.23899994e+01,   9.43899994e+01,   9.16500015e+01,
      9.38799973e+01,   6.11406000e+07,   9.38799973e+01],
   [  9.45500031e+01,   9.46999969e+01,   9.30100021e+01,
      9.34899979e+01,   4.65074000e+07,   9.34899979e+01],
   [  9.41600037e+01,   9.52099991e+01,   9.38899994e+01,
      9.45599976e+01,   4.19231000e+07,   9.45599976e+01]], dtype=float32)

y_train
Out[7]: 
array([  0.40885201,   0.37884501,   0.38822201, ...,  93.87999725,
   93.48999786,  94.55999756], dtype=float32)

So far, the data is ready. There is no redundancy introduced. Now the question is, how to describe a Keras LSTM model / training process on this data.

EDIT 3:

Here is the updated code with the 3D data structure required for recurrent networks. (See answer by Lorrit). It does not work, though.

EDIT 4: removed the extra comma after Activation('sigmoid'), shaped Y_train in the correct way. Still the same error.

import numpy as np

from keras.models import Sequential
from keras.layers import Dense,  Activation, LSTM

nb_timesteps    =  4
nb_features     =  5
batch_size      = 32

# load file
X_train = np.genfromtxt('table.csv', 
                        delimiter=',',  
                        names=None, 
                        unpack=False,
                        dtype=None)

# delete the first row with the names
X_train = np.delete(X_train, (0), axis=0)

# invert the order of the rows, so that the oldest
# entry is in the first row and the newest entry
# comes last
X_train = np.flipud(X_train)

# the last column is our Y
Y_train = X_train[:,6].astype(np.float32)

Y_train = np.delete(Y_train, range(0,6))
Y_train = np.array(Y_train)
Y_train.shape = (len(Y_train), 1)

# we don't use the timestamps. convert the rest to Float32
X_train = X_train[:, 1:6].astype(np.float32)

# shape X_train
X_train.shape = (1,len(X_train), nb_features)


# Now comes Lorrit's code for shaping the 3D-input-data
# http://stackoverflow.com/questions/36992855/keras-how-should-i-prepare-input-data-for-rnn
flag = 0

for sample in range(X_train.shape[0]):
    tmp = np.array([X_train[sample,i:i+nb_timesteps,:] for i in range(X_train.shape[1] - nb_timesteps + 1)])

    if flag==0:
        new_input = tmp
        flag = 1

    else:
        new_input = np.concatenate((new_input,tmp))

X_train = np.delete(new_input, len(new_input) - 1, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
# X successfully shaped

# free some memory
tmp = None
new_input = None


# split data for training, validation and test
# 50:25:25
X_train, X_test = np.split(X_train, 2, axis=0)
X_valid, X_test = np.split(X_test, 2, axis=0)

Y_train, Y_test = np.split(Y_train, 2, axis=0)
Y_valid, Y_test = np.split(Y_test, 2, axis=0)


print('Build model...')

model = Sequential([
    Dense(8, input_dim=nb_features),
    Activation('softmax'),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

model.compile(loss='mse',
              optimizer='RMSprop',
              metrics=['accuracy'])

print('Train...')
print(X_train.shape)
print(Y_train.shape)
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=15,
          validation_data=(X_test, Y_test))
score, acc = model.evaluate(X_test, Y_test,
                            batch_size=batch_size)

print('Test score:', score)
print('Test accuracy:', acc)

There still seems to be an issue with the data, Keras says:

Using Theano backend.
Using gpu device 0: GeForce GTX 960 (CNMeM is disabled, cuDNN not available)Build model...

Traceback (most recent call last):

  File "<ipython-input-1-3a6e9e045167>", line 1, in <module>
    runfile('C:/Users/admin/Documents/pycode/lstm/lstm5.py', wdir='C:/Users/admin/Documents/pycode/lstm')

  File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)

  File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "C:/Users/admin/Documents/pycode/lstm/lstm5.py", line 79, in <module>
    Activation('sigmoid')

  File "d:\git\keras\keras\models.py", line 93, in __init__
    self.add(layer)

  File "d:\git\keras\keras\models.py", line 146, in add
    output_tensor = layer(self.outputs[0])

  File "d:\git\keras\keras\engine\topology.py", line 441, in __call__
    self.assert_input_compatibility(x)

  File "d:\git\keras\keras\engine\topology.py", line 382, in assert_input_compatibility
    str(K.ndim(x)))

Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2
like image 883
Theo H. Avatar asked May 19 '16 10:05

Theo H.


People also ask

How the inputs are selected for LSTM RNN models?

The input data for the LSTM has to be 3D. The input data has to be reshaped into (samples, time steps, features) . This means that targets and features must have the same shape. You need to set a number of time steps for your problem, in other words, how many samples will be used to make a prediction.


2 Answers

In your model definition you placed a Dense layer before LSTM layer. You need to use TimeDistributed layer on Dense layer.

Try to change

model = Sequential([
    Dense(8, input_dim=nb_features),
    Activation('softmax'),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])

to

model = Sequential([
    TimeDistributed(Dense(8, input_dim=nb_features, Activation='softmax')),
    LSTM(4, dropout_W=0.2, dropout_U=0.2),
    Dense(1),
    Activation('sigmoid')
])
like image 101
sytrus Avatar answered Oct 04 '22 08:10

sytrus


You are still missing one preprocessing step before feeding the data to the LSTM. You will have to decide how many previous data samples (previous days) you want to include in the calculation of the current day's AdjClose. See my answer here on how to do that. Your data should then be 3-dimensional of shape (nb_samples, nb_included_previous_days, features).

Then you can feed the 3D to a standard LSTM layer with one output. This value you can compare to y_train and try to minimize the error. Remember to pick a loss function that is appropriate for regression, e.g. mean squared error.

like image 38
Lorrit Avatar answered Oct 04 '22 08:10

Lorrit