Can someone please explain this? I know bidirectional LSTMs have a forward and backward pass but what is the advantage of this over a unidirectional LSTM?
What is each of them better suited for?
Bidirectional LSTMs are an extension of traditional LSTMs that can improve model performance on sequence classification problems. In problems where all timesteps of the input sequence are available, Bidirectional LSTMs train two instead of one LSTMs on the input sequence.
A Bidirectional LSTM, or biLSTM, is a sequence processing model that consists of two LSTMs: one taking the input in a forward direction, and the other in a backwards direction.
The original LSTM model is comprised of a single hidden LSTM layer followed by a standard feedforward output layer. The stacked LSTM is an extension to this model that has multiple hidden LSTM layers where each layer contains multiple memory cells.
There are two main types of LSTM models that can be used for multi-step forecasting; they are: Vector Output Model. Encoder-Decoder Model.
LSTM in its core, preserves information from inputs that has already passed through it using the hidden state.
Unidirectional LSTM only preserves information of the past because the only inputs it has seen are from the past.
Using bidirectional will run your inputs in two ways, one from past to future and one from future to past and what differs this approach from unidirectional is that in the LSTM that runs backwards you preserve information from the future and using the two hidden states combined you are able in any point in time to preserve information from both past and future.
What they are suited for is a very complicated question but BiLSTMs show very good results as they can understand context better, I will try to explain through an example.
Lets say we try to predict the next word in a sentence, on a high level what a unidirectional LSTM will see is
The boys went to ....
And will try to predict the next word only by this context, with bidirectional LSTM you will be able to see information further down the road for example
Forward LSTM:
The boys went to ...
Backward LSTM:
... and then they got out of the pool
You can see that using the information from the future it could be easier for the network to understand what the next word is.
Adding to Bluesummer's answer, here is how you would implement Bidirectional LSTM from scratch without calling BiLSTM
module. This might better contrast the difference between a uni-directional and bi-directional LSTMs. As you see, we merge two LSTMs to create a bidirectional LSTM.
You can merge outputs of the forward and backward LSTMs by using either {'sum', 'mul', 'concat', 'ave'}
.
left = Sequential() left.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform', forget_bias_init='one', return_sequences=True, activation='tanh', inner_activation='sigmoid', input_shape=(99, 13))) right = Sequential() right.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform', forget_bias_init='one', return_sequences=True, activation='tanh', inner_activation='sigmoid', input_shape=(99, 13), go_backwards=True)) model = Sequential() model.add(Merge([left, right], mode='sum')) model.add(TimeDistributedDense(nb_classes)) model.add(Activation('softmax')) sgd = SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer=sgd) print("Train...") model.fit([X_train, X_train], Y_train, batch_size=1, nb_epoch=nb_epoches, validation_data=([X_test, X_test], Y_test), verbose=1, show_accuracy=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With