I am running an LSTM to classify medical recordings for each patient. That's being said, for each patient (an observation) I have one CSV file. The whole dataset is multiple CSV files, each one of them is DataFrame of time series. This is not that obvious cuz there is one small difference between feeding LSTM with images and time series, it's the SIZE of sequences. CNN assumes the inputs have the same size but here we have inputs with different length
Question:
How to feed LSTM in this case?
I am sure if you are familiar with image classification you can help with my question but it's not just the same approach.
Example
For one patient I have a DataFrame that has all the recording I want to use in my LSTM.
df.shape
Out[29]: (5679000, 4)
# The 5679000 change from one patient to another but 4 columns are fixed
Have a look here:
df.head(4)
Out[30]:
AIRFLOW SaO2 ECG Target
0 -34.0 31145.0 304.0 0.0
1 -75.0 31145.0 272.0 0.0
2 -63.0 31145.0 254.0 0.0
3 -57.0 31145.0 251.0 1.0
4 -60.0 31145.0 229.0 0.0
Problem:
Any suggestions to feed my network?
Since your data points have variable sequence lengths, you can't easily train your network all at once. Instead, you must train in mini batches of size 1 or fix your sequence length, although the latter probably doesn't make sense based on the data you're dealing with.
Take a look at the Keras function train_on_batch. Using this, you can train your model using each individual patient, although using a batch size of 1 has its own issues.
As for the model, I would suggest using the Keras functional API. If you want to try something simple, just use an input sequence of variable length and a feature size of 3
. This should give you a baseline, which is what I assume you want from your function name. Something like this:
input_ = Input(shape=(None, 3))
x = LSTM(128)(input_)
output = Dense(1, activation='sigmoid')(x)
model = Model(input_, output)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With