Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do stateful bidirectional RNNs work in Keras

In Keras the Bidirectional wrapper for RNNs also supports stateful=true. I don't really understand how this is supposed to work:

In a stateful unidirectional model the state of a batch is carried over to the next batch. I guess it works the same for the forward layer in the bidirectional model.

But where is the backward layer getting it's states from? If I understand everything correctly it should technically recieve it's state from the "next" batch. But obviously the "next" batch is not computet yet, so how does it work?

like image 210
birnbaum Avatar asked Feb 15 '17 15:02

birnbaum


People also ask

How does bidirectional RNN work?

Bidirectional recurrent neural networks (BRNN) connect two hidden layers of opposite directions to the same output. With this form of generative deep learning, the output layer can get information from past (backwards) and future (forward) states simultaneously.

What problem is solved by bidirectional RNNS?

RNN s (including GRU s and LSTM s) are able to obtain the context only in one direction, from the preceding words. They're unable to look ahead into future words. Bidirectional RNN s solve this problem by processing the sequence in both directions.

How does bidirectional work?

How Does Bidirectional Charging Work? Alternating current (AC) power from the grid is converted to direct current (DC) voltage that is stored in the car's battery while charging. Then, EV drivers can access the power in the battery to power a home or add power back to the electricity grid.


1 Answers

One may think about a Bidirectional layer in a following manner:

forward = Recurrent(..)(input)
backward = Recurrent(..., reverse_input=True)(input)
output = merge([forward, backward], ...)

So - as you can see - you are losing the temporal orientation. You are analysing the input both from its beginning and end. In this case - setting stateful=True simply takes its starting state from a previous sample accordingly to direction of a bidirectional branch (forward takes from forward, backward takes from backward).

This makes your model losing the interpretation - that samples from concurrent batches might be interpreted as a compact sequence divided into batches.

like image 187
Marcin Możejko Avatar answered Sep 19 '22 09:09

Marcin Możejko