How to prepare data for LSTM when using multiple time series of different lengths and multiple features?

Question

I have a dataset from a number of users (nUsers). Each user is sampled randomly in time (non-constant nSamples for each user). Each sample has a number of features (nFeatures). For example:

nUsers = 3 ---> 3 users

nSamples = [32, 52, 21] ---> first user was sampled 32 times second user was sampled 52 times etc.

nFeatures = 10 ---> constant number of features for each sample.

I would like the LSTM to produce a current prediction based on the current features and on previous predictions of the same user. Can I do that in Keras using LSTM layer? I have 2 problems:

The data has a different time series for each user. How do I incorporate this?
How do I deal with adding the previous predictions into the current time feature space in order to make a current prediction?

Thanks for your help!

Daniel Möller · Accepted Answer

It sounds like each user is a sequence, so, users may be the "batch size" for your problem. So at first, nExamples = nUsers.

If I understood your problem correctly (predict the next element), you should define a maximum length of "looking back". Say you can predict the next element from looking at the 7 previous ones, for instance (and not looking at the entire sequence).

For that, you should separate your data like this:

example 1: x[0] = [s0, s1, s2, ..., s6] | y[0] = s7   
example 2: x[1] = [s1, s2, s3, ..., s7] | y[1] = s8

Where sn is a sample with 10 features. Usually, it doesn't matter if you mix users. Create these little segments for all users and put everything together.

This will result in in arrays shaped like

x.shape -> (BatchSize, 7, 10) -> (BatchSize, 7 step sequences, 10 features)   
y.shape -> (BatchSize, 10)

Maybe you don't mean predicting the next set of features, but just predicting something. In that case, just replace y for the value you want. That may result in y.shape -> (BatchSize,) if you want just a single result.

Now, if you do need the entire sequence for predicting (instead of n previous elements), then you will have to define the maximum length and pad the sequences.

Suppose your longest sequence, as in your example, is 52. Then:

x.shape -> (Users, 52, 10).

Then you will have to "pad" the sequences to fill the blanks.
You can for instance fill the beginning of the sequences with zero features, such as:

x[0] = [s0, s1, s2, ......., s51] -> user with the longest sequence    
x[1] = [0 , 0 , s0, s1, ..., s49] -> user with a shorter sequence

Or (I'm not sure this works, I never tested), pad the ending with zero values and use the Masking Layer, which is what Keras have for "variable length sequences". You still use a fixed size array, but internally it will (?) discard the zero values.

How to prepare data for LSTM when using multiple time series of different lengths and multiple features?

Tags:

python

keras

lstm

time-series

data-science

AR_

1 Answers

Daniel Möller

Recent Activity

Donate For Us

How to prepare data for LSTM when using multiple time series of different lengths and multiple features?

Tags:

python

keras

lstm

time-series

data-science

AR_

1 Answers

Daniel Möller

Related questions

Recent Activity

Donate For Us