As the title suggests, I have a time-series data set and there is a lot of missing data. What is the best way to handle this for a LSTM model?
To give further detail, I have about five data sources to create the dataset and some of them do not allow me to get historical data so I'm missing quite a bit for the features in that source. I can fill some in using the most recently observed sample, but for the most part that isn't possible.
Some suggestions I have seen are:
But for all I feel like I will be losing a lot of data integrity. How is this usually handled / what is the best way to adjust for this in LSTM models?
I'm using Python / Keras / TensorFlow.
Maybe masking at the top layer of your model could help.
For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With