Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling Missing Data in RNN / LSTM (Time-Series)

As the title suggests, I have a time-series data set and there is a lot of missing data. What is the best way to handle this for a LSTM model?

To give further detail, I have about five data sources to create the dataset and some of them do not allow me to get historical data so I'm missing quite a bit for the features in that source. I can fill some in using the most recently observed sample, but for the most part that isn't possible.

Some suggestions I have seen are:

  • Hidden Markov Modeling
  • Expectation Maximization
  • Using a neural net to predict the missing values

But for all I feel like I will be losing a lot of data integrity. How is this usually handled / what is the best way to adjust for this in LSTM models?

I'm using Python / Keras / TensorFlow.

like image 611
Zach Avatar asked Oct 28 '25 12:10

Zach


1 Answers

Maybe masking at the top layer of your model could help.

For each timestep in the input tensor (dimension #1 in the tensor), if all values in the input tensor at that timestep are equal to mask_value, then the timestep will be masked (skipped) in all downstream layers (as long as they support masking).

like image 177
Toni Piza Avatar answered Oct 30 '25 01:10

Toni Piza



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!