Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras LSTM Multiple Input Multiple Output

I am trying to train an RNN to predict stock prices in the future.

My goal is to train the model using two datasets: X_train and y_train.

X_train is a 3D array including (number of observations, number of previous candles, attributes of each candle)

y_train is a 3D array including (number of observations, number of observations in the future, price).

So if I have data from 500 candles my X_train will be (430,60,6): for 430 observations (current candle each time) take the 60 observations that came before it and 6 characteristics (close price, volume, etc.) of them and try to use that data (through the RNN) to predict y_train(430, 10,1): for 430 observations predict the close price (corresponds to 1) for the next 10 candles. I cannot, for the life of me, get the dimensions to enter the model correctly. I use the following code for the model:

regressor = Sequential()

regressor.add(LSTM(units = 50, return_sequences = True, input_shape = ( 
X_train.shape[1], 6)))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 50, return_sequences=True))
regressor.add(Dropout(0.2))

regressor.add(LSTM(units = 1))

regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
regressor.fit(X_train, y_train, epochs = 20, batch_size = 32)

I get ValueError: Error when checking target: expected lstm_6 to have 2 dimensions, but got array with shape (430, 10, 1)

Thanks a lot.

like image 580
Panos Filianos Avatar asked Apr 04 '18 23:04

Panos Filianos


1 Answers

Let's take a step back here and look at what is done and why it does not work.

Firstly, your input data is of the following shape:

(samples, timesteps, features)

Secondly, you would like your output data to be of the following shape:

(samples, future_timesteps, 1)

This kind of architecture is known as sequence to sequence learning (colloquially referred to as Seq2Seq).

So how do we do a Seq2Seq? There are a few ways you might want to read up on and this is still an area of very active research. Here are some ideas.

Note this is done with the keras functional api. It is better.

Read the input sequence from both directions, then predict the next 30 units for pricing by having a final dense layer of 30 units.

input_layer = Input(shape=(600,6,))

lstm = Bidirectional(
    LSTM(250),
    merge_mode='concat'
)(input_layer)

pred = Dense(10)(lstm)
model = Model(inputs=input_layer, outputs=pred)
model.compile(optimizer = 'adam', loss = 'mean_squared_error')

model.fit(X_train, Y_train, epochs = 20, batch_size = 32)

where Y_train is reshaped to (430, 10) instead of (430, 10, 1). In response to a comment, this does not alter the labels (Y_train) in any meaningful way. This is because the difference between (x, y, 1) and (x,y) is just as follows:

[[[1],[2],[3]],
 [[4],[5],[6]]]

instead of

[[1,2,3],
 [4,5,6]]

So a call like:

Y_train = np.reshape(Y_train, Y_train.shape[:2])

does not meaningfully affect the training data.

However, this may not be the best architecture to begin with. This is because a single dense layer will be taking in the last hidden states from the forward and backward direction, instead of feeding every hidden state (from both directions) at every timestep. Effectively, the above model is unaware of more information presented in the below model. I suggest the following as an alternative.

input_layer = Input(shape=(600,6,))

encoder = Bidirectional(
    LSTM(250),
    merge_mode='concat',
    return_sequences=True
)(input_layer)
decoder = LSTM(250, return_sequences=True)(encoder)
pred = TimeDistributed(Dense(1))(decoder)
model = Model(inputs=input_layer, outputs=pred)
model.compile(optimizer = 'adam', loss = 'mean_squared_error')

model.fit(X_train, Y_train, epochs = 20, batch_size = 32)

where Y_train is formatted as (430, 60, 1). If you only care about the next 10 entries, pass sample weighting into fit and weight everything after the 10th time index as 0 (you can even populate it with garbage then if you want while training). This would be done as follows:

Y_train = np.hstack([Y_train]*6) 

Then you would create a sample weight mask like:

W = np.zeros(Y_train.shape)
W[:,np.arange(W.shape[1]) < 10,:] = 1

That is, a mask where only the first 10 entries along the second axis are 1, and all others are zero. Pass this W, as the sample_weights parameter in model.fit

This way, the model can have a true sequence to sequence notion in the encoder/decoder paradigm.

Finally, it is not that additional LSTMS (stacking) is necessaraly bad, but it is regarded to be incremental at best in improving models of this nature, and does add a large amount of complexity as well as severely increasing training time. Get a single model to work with recurrent depths of 1 (no stacking), and then you can work on either stacking your single lstm or stacking encoders/decoders in the second structure I gave you.

Some additional tips for what you are doing:

  • scale your data. StandardScaler, MinMaxScaler, whatever. Do not pass raw price data into an LSTM (or any deep learning model) as the activation functions will smash these values to -1, 1, 0 whatever and you will be subject to the vanishing or exploding gradients problem.

I hope this helps!

like image 121
modesitt Avatar answered Oct 22 '22 09:10

modesitt