Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between a bidirectional LSTM and an LSTM?

Can someone please explain this? I know bidirectional LSTMs have a forward and backward pass but what is the advantage of this over a unidirectional LSTM?

What is each of them better suited for?

like image 947
shekit Avatar asked Mar 26 '17 23:03

shekit


People also ask

When should one use bidirectional LSTM as opposed to normal LSTM?

Bidirectional LSTMs are an extension of traditional LSTMs that can improve model performance on sequence classification problems. In problems where all timesteps of the input sequence are available, Bidirectional LSTMs train two instead of one LSTMs on the input sequence.

What is bidirectional LSTM?

A Bidirectional LSTM, or biLSTM, is a sequence processing model that consists of two LSTMs: one taking the input in a forward direction, and the other in a backwards direction.

What is difference between LSTM and stacked LSTM?

The original LSTM model is comprised of a single hidden LSTM layer followed by a standard feedforward output layer. The stacked LSTM is an extension to this model that has multiple hidden LSTM layers where each layer contains multiple memory cells.

What are the types of LSTM?

There are two main types of LSTM models that can be used for multi-step forecasting; they are: Vector Output Model. Encoder-Decoder Model.


2 Answers

LSTM in its core, preserves information from inputs that has already passed through it using the hidden state.

Unidirectional LSTM only preserves information of the past because the only inputs it has seen are from the past.

Using bidirectional will run your inputs in two ways, one from past to future and one from future to past and what differs this approach from unidirectional is that in the LSTM that runs backwards you preserve information from the future and using the two hidden states combined you are able in any point in time to preserve information from both past and future.

What they are suited for is a very complicated question but BiLSTMs show very good results as they can understand context better, I will try to explain through an example.

Lets say we try to predict the next word in a sentence, on a high level what a unidirectional LSTM will see is

The boys went to ....

And will try to predict the next word only by this context, with bidirectional LSTM you will be able to see information further down the road for example

Forward LSTM:

The boys went to ...

Backward LSTM:

... and then they got out of the pool

You can see that using the information from the future it could be easier for the network to understand what the next word is.

like image 95
bluesummers Avatar answered Sep 18 '22 13:09

bluesummers


Adding to Bluesummer's answer, here is how you would implement Bidirectional LSTM from scratch without calling BiLSTM module. This might better contrast the difference between a uni-directional and bi-directional LSTMs. As you see, we merge two LSTMs to create a bidirectional LSTM.

You can merge outputs of the forward and backward LSTMs by using either {'sum', 'mul', 'concat', 'ave'}.

left = Sequential() left.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',                forget_bias_init='one', return_sequences=True, activation='tanh',                inner_activation='sigmoid', input_shape=(99, 13))) right = Sequential() right.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',                forget_bias_init='one', return_sequences=True, activation='tanh',                inner_activation='sigmoid', input_shape=(99, 13), go_backwards=True))  model = Sequential() model.add(Merge([left, right], mode='sum'))  model.add(TimeDistributedDense(nb_classes)) model.add(Activation('softmax'))  sgd = SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer=sgd) print("Train...") model.fit([X_train, X_train], Y_train, batch_size=1, nb_epoch=nb_epoches, validation_data=([X_test, X_test], Y_test), verbose=1, show_accuracy=True) 
like image 39
aerin Avatar answered Sep 21 '22 13:09

aerin