I am struggling to configure a Keras LSTM for a simple regression task. There is some very basic explanation at the official page: Keras RNN documentation
But to fully understand, example configurations with example data would be extremely helpful.
I have barely found examples for regression with Keras-LSTM. Most examples are about classification (text or images). I've studied the LSTM examples which come with the Keras distribution and one example I found through Google search: http://danielhnyk.cz/ It offers some insight, though the author admitts the approach is quite memory-inefficient, since data samples have to be stored very redundantly.
Although, an improvement was introduced by a commentor (Taha), data-storage is still redundant, I doubt this is the way it was meant to be by the Keras developers.
I've downloaded some simple example sequential data, which happens to be stock data from Yahoo finance. It is freely available from Yahoo Finance Data
Date, Open, High, Low, Close, Volume, Adj Close
2016-05-18, 94.160004, 95.209999, 93.889999, 94.559998, 41923100, 94.559998
2016-05-17, 94.550003, 94.699997, 93.010002, 93.489998, 46507400, 93.489998
2016-05-16, 92.389999, 94.389999, 91.650002, 93.879997, 61140600, 93.879997
2016-05-13, 90.00, 91.669998, 90.00, 90.519997, 44188200, 90.519997
The table consists of more than 8900 such lines of Apple stock data. There are 7 columns = data points for each day. The value to predict would be "AdjClose", which is the value at the end of the day
So the goal would be to predict the AdjClose for the next day, based on the sequence of a the previous few days. (This is probably next to impossible, but it is always good to see how a tool behaves under challenging conditions.)
I think this should be a very standard prediction/regression case for LSTM and easily transferrable to other problem domains.
So, how should the data be formatted (X_train, y_train) for minimum redundancy and how do I initialize the Sequential model with only one LSTM layer and a couple of hidden neurons?
Kind Regards, Theo
PS: I started coding this:
...
X_train
Out[6]:
array([[ 2.87500000e+01, 2.88750000e+01, 2.87500000e+01,
2.87500000e+01, 1.17258400e+08, 4.31358010e-01],
[ 2.73750019e+01, 2.73750019e+01, 2.72500000e+01,
2.72500000e+01, 4.39712000e+07, 4.08852011e-01],
[ 2.53750000e+01, 2.53750000e+01, 2.52500000e+01,
2.52500000e+01, 2.64320000e+07, 3.78845006e-01],
...,
[ 9.23899994e+01, 9.43899994e+01, 9.16500015e+01,
9.38799973e+01, 6.11406000e+07, 9.38799973e+01],
[ 9.45500031e+01, 9.46999969e+01, 9.30100021e+01,
9.34899979e+01, 4.65074000e+07, 9.34899979e+01],
[ 9.41600037e+01, 9.52099991e+01, 9.38899994e+01,
9.45599976e+01, 4.19231000e+07, 9.45599976e+01]], dtype=float32)
y_train
Out[7]:
array([ 0.40885201, 0.37884501, 0.38822201, ..., 93.87999725,
93.48999786, 94.55999756], dtype=float32)
So far, the data is ready. There is no redundancy introduced. Now the question is, how to describe a Keras LSTM model / training process on this data.
EDIT 3:
Here is the updated code with the 3D data structure required for recurrent networks. (See answer by Lorrit). It does not work, though.
EDIT 4: removed the extra comma after Activation('sigmoid'), shaped Y_train in the correct way. Still the same error.
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, LSTM
nb_timesteps = 4
nb_features = 5
batch_size = 32
# load file
X_train = np.genfromtxt('table.csv',
delimiter=',',
names=None,
unpack=False,
dtype=None)
# delete the first row with the names
X_train = np.delete(X_train, (0), axis=0)
# invert the order of the rows, so that the oldest
# entry is in the first row and the newest entry
# comes last
X_train = np.flipud(X_train)
# the last column is our Y
Y_train = X_train[:,6].astype(np.float32)
Y_train = np.delete(Y_train, range(0,6))
Y_train = np.array(Y_train)
Y_train.shape = (len(Y_train), 1)
# we don't use the timestamps. convert the rest to Float32
X_train = X_train[:, 1:6].astype(np.float32)
# shape X_train
X_train.shape = (1,len(X_train), nb_features)
# Now comes Lorrit's code for shaping the 3D-input-data
# http://stackoverflow.com/questions/36992855/keras-how-should-i-prepare-input-data-for-rnn
flag = 0
for sample in range(X_train.shape[0]):
tmp = np.array([X_train[sample,i:i+nb_timesteps,:] for i in range(X_train.shape[1] - nb_timesteps + 1)])
if flag==0:
new_input = tmp
flag = 1
else:
new_input = np.concatenate((new_input,tmp))
X_train = np.delete(new_input, len(new_input) - 1, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
X_train = np.delete(X_train, 0, axis = 0)
# X successfully shaped
# free some memory
tmp = None
new_input = None
# split data for training, validation and test
# 50:25:25
X_train, X_test = np.split(X_train, 2, axis=0)
X_valid, X_test = np.split(X_test, 2, axis=0)
Y_train, Y_test = np.split(Y_train, 2, axis=0)
Y_valid, Y_test = np.split(Y_test, 2, axis=0)
print('Build model...')
model = Sequential([
Dense(8, input_dim=nb_features),
Activation('softmax'),
LSTM(4, dropout_W=0.2, dropout_U=0.2),
Dense(1),
Activation('sigmoid')
])
model.compile(loss='mse',
optimizer='RMSprop',
metrics=['accuracy'])
print('Train...')
print(X_train.shape)
print(Y_train.shape)
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=15,
validation_data=(X_test, Y_test))
score, acc = model.evaluate(X_test, Y_test,
batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)
There still seems to be an issue with the data, Keras says:
Using Theano backend.
Using gpu device 0: GeForce GTX 960 (CNMeM is disabled, cuDNN not available)Build model...
Traceback (most recent call last):
File "<ipython-input-1-3a6e9e045167>", line 1, in <module>
runfile('C:/Users/admin/Documents/pycode/lstm/lstm5.py', wdir='C:/Users/admin/Documents/pycode/lstm')
File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/admin/Documents/pycode/lstm/lstm5.py", line 79, in <module>
Activation('sigmoid')
File "d:\git\keras\keras\models.py", line 93, in __init__
self.add(layer)
File "d:\git\keras\keras\models.py", line 146, in add
output_tensor = layer(self.outputs[0])
File "d:\git\keras\keras\engine\topology.py", line 441, in __call__
self.assert_input_compatibility(x)
File "d:\git\keras\keras\engine\topology.py", line 382, in assert_input_compatibility
str(K.ndim(x)))
Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2
The input data for the LSTM has to be 3D. The input data has to be reshaped into (samples, time steps, features) . This means that targets and features must have the same shape. You need to set a number of time steps for your problem, in other words, how many samples will be used to make a prediction.
In your model definition you placed a Dense layer before LSTM layer. You need to use TimeDistributed layer on Dense layer.
Try to change
model = Sequential([
Dense(8, input_dim=nb_features),
Activation('softmax'),
LSTM(4, dropout_W=0.2, dropout_U=0.2),
Dense(1),
Activation('sigmoid')
])
to
model = Sequential([
TimeDistributed(Dense(8, input_dim=nb_features, Activation='softmax')),
LSTM(4, dropout_W=0.2, dropout_U=0.2),
Dense(1),
Activation('sigmoid')
])
You are still missing one preprocessing step before feeding the data to the LSTM. You will have to decide how many previous data samples (previous days) you want to include in the calculation of the current day's AdjClose. See my answer here on how to do that. Your data should then be 3-dimensional of shape (nb_samples, nb_included_previous_days, features).
Then you can feed the 3D to a standard LSTM layer with one output. This value you can compare to y_train and try to minimize the error. Remember to pick a loss function that is appropriate for regression, e.g. mean squared error.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With