Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to feed LSTM when training data is in multiple csv files of time series of different length?

I am running an LSTM to classify medical recordings for each patient. That's being said, for each patient (an observation) I have one CSV file. The whole dataset is multiple CSV files, each one of them is DataFrame of time series. This is not that obvious cuz there is one small difference between feeding LSTM with images and time series, it's the SIZE of sequences. CNN assumes the inputs have the same size but here we have inputs with different length

Question:

How to feed LSTM in this case?

I am sure if you are familiar with image classification you can help with my question but it's not just the same approach.

Example

For one patient I have a DataFrame that has all the recording I want to use in my LSTM.

df.shape
Out[29]: (5679000, 4) 
# The 5679000 change from one patient to another but 4 columns are fixed

Have a look here:

df.head(4)

Out[30]: 

   AIRFLOW     SaO2    ECG  Target  
0    -34.0  31145.0  304.0     0.0  
1    -75.0  31145.0  272.0     0.0  
2    -63.0  31145.0  254.0     0.0  
3    -57.0  31145.0  251.0     1.0  
4    -60.0  31145.0  229.0     0.0  

Problem:

Any suggestions to feed my network?

like image 556
smerllo Avatar asked Feb 20 '19 20:02

smerllo


1 Answers

Since your data points have variable sequence lengths, you can't easily train your network all at once. Instead, you must train in mini batches of size 1 or fix your sequence length, although the latter probably doesn't make sense based on the data you're dealing with.

Take a look at the Keras function train_on_batch. Using this, you can train your model using each individual patient, although using a batch size of 1 has its own issues.

As for the model, I would suggest using the Keras functional API. If you want to try something simple, just use an input sequence of variable length and a feature size of 3. This should give you a baseline, which is what I assume you want from your function name. Something like this:

input_ = Input(shape=(None, 3))
x = LSTM(128)(input_)
output = Dense(1, activation='sigmoid')(x)
model = Model(input_, output)
like image 86
Luke DeLuccia Avatar answered Oct 24 '22 21:10

Luke DeLuccia