If I use a stateful RNN in Keras for processing a sequence of length N divided into N parts (each time step is processed individually),
The back propagation horizon is limited to the second dimension of the input sequence. i.e. if your data is of type (num_sequences, num_time_steps_per_seq, data_dim)
then back prop is done over a time horizon of value num_time_steps_per_seq
Take a look at
https://github.com/fchollet/keras/issues/3669
There are a couple things you need to know about RNNs in Keras. At default the parameter return_sequences=False
in all recurrent neural networks. This means that at default only the activations of the RNN after processing the entire input sequence are returned as output. If you want to have the activations at every time step and optimize every time step seperately, you need to pass return_sequences=True
as parameter (https://keras.io/layers/recurrent/#recurrent).
The next thing that is important to know is that all a stateful RNN does is remember the last activation. So if you have a large input sequence and break it up in smaller sequences (which I believe you are doing), the activation in the network is retained in the network after processing the first sequence and therefore affects the activations in the network when processing the second sequence. This has nothing to do with how the network is optimized, the network simply minimizes the difference between the output and the targets you give.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With