Shuffling training data with LSTM RNN

Tags:

Since an LSTM RNN uses previous events to predict current sequences, why do we shuffle the training data? Don't we lose the temporal ordering of the training data? How is it still effective at making predictions after being trained on shuffled training data?

703

asked Jun 27 '17 20:06

hellowill89

1 Answers

In general, when you shuffle the training data (a set of sequences), you shuffle the order in which sequences are fed to the RNN, you don't shuffle the ordering within individual sequences. This is fine to do when your network is stateless:

Stateless Case:

The network's memory only persists for the duration of a sequence. Training on sequence B before sequence A doesn't matter because the network's memory state does not persist across sequences.

On the other hand:

Stateful Case:

The network's memory persists across sequences. Here, you cannot blindly shuffle your data and expect optimal results. Sequence A should be fed to the network before sequence B because A comes before B, and we want the network to evaluate sequence B with memory of what was in sequence A.

118

answered Sep 25 '22 05:09

Brian Bartoldson

Related questions
                            
                                How to calculate optimal batch size
                            
                                What is the difference between Q-learning and Value Iteration?
                            
                                Comparing R to Matlab for Data Mining
                            
                                SVM and Neural Network
                            
                                Differences in SciKit Learn, Keras, or Pytorch [closed]
                            
                                Why rotation-invariant neural networks are not used in winners of the popular competitions?
                            
                                Machine Learning : Tensorflow v/s Tensorflow.js v/s Brain.js [closed]
                            
                                How to understand loss acc val_loss val_acc in Keras model fitting
                            
                                Linear Regression :: Normalization (Vs) Standardization
                            
                                Keras: weighted binary crossentropy
                            
                                Sklearn StratifiedKFold: ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead
                            
                                What is the meaning of the "None" in model.summary of KERAS?
                            
                                What is a multi-headed model? And what exactly is a 'head' in a model?
                            
                                Candidate Elimination Algorithm
                            
                                Determining the most contributing features for SVM classifier in sklearn
                            
                                scikit-learn return value of LogisticRegression.predict_proba
                            
                                What is "metrics" in Keras?
                            
                                What is `lr_policy` in Caffe?
                            
                                Unknown initializer: GlorotUniform when loading Keras model
                            
                                What are the differences between all these cross-entropy losses in Keras and TensorFlow?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Shuffling training data with LSTM RNN

Tags:

machine-learning

keras

lstm

recurrent-neural-network

hellowill89

People also ask

1 Answers

Brian Bartoldson

Recent Activity

Donate For Us