How to feed LSTM when training data is in multiple csv files of time series of different length?

Question

I am running an LSTM to classify medical recordings for each patient. That's being said, for each patient (an observation) I have one CSV file. The whole dataset is multiple CSV files, each one of them is DataFrame of time series. This is not that obvious cuz there is one small difference between feeding LSTM with images and time series, it's the SIZE of sequences. CNN assumes the inputs have the same size but here we have inputs with different length

Question:

How to feed LSTM in this case?

I am sure if you are familiar with image classification you can help with my question but it's not just the same approach.

Example

For one patient I have a DataFrame that has all the recording I want to use in my LSTM.

df.shape
Out[29]: (5679000, 4) 
# The 5679000 change from one patient to another but 4 columns are fixed

Have a look here:

df.head(4)

Out[30]: 

   AIRFLOW     SaO2    ECG  Target  
0    -34.0  31145.0  304.0     0.0  
1    -75.0  31145.0  272.0     0.0  
2    -63.0  31145.0  254.0     0.0  
3    -57.0  31145.0  251.0     1.0  
4    -60.0  31145.0  229.0     0.0

Problem:

Any suggestions to feed my network?

Luke DeLuccia · Accepted Answer

Since your data points have variable sequence lengths, you can't easily train your network all at once. Instead, you must train in mini batches of size 1 or fix your sequence length, although the latter probably doesn't make sense based on the data you're dealing with.

Take a look at the Keras function train_on_batch. Using this, you can train your model using each individual patient, although using a batch size of 1 has its own issues.

As for the model, I would suggest using the Keras functional API. If you want to try something simple, just use an input sequence of variable length and a feature size of 3. This should give you a baseline, which is what I assume you want from your function name. Something like this:

input_ = Input(shape=(None, 3))
x = LSTM(128)(input_)
output = Dense(1, activation='sigmoid')(x)
model = Model(input_, output)

How to feed LSTM when training data is in multiple csv files of time series of different length?

Tags:

python

tensorflow

keras

lstm

time-series

smerllo

1 Answers

Luke DeLuccia

Recent Activity

Donate For Us

How to feed LSTM when training data is in multiple csv files of time series of different length?

Tags:

python

tensorflow

keras

lstm

time-series

smerllo

1 Answers

Luke DeLuccia

Related questions

Recent Activity

Donate For Us