What's the difference between a bidirectional LSTM and an LSTM?

2 Answers

LSTM in its core, preserves information from inputs that has already passed through it using the hidden state.

Unidirectional LSTM only preserves information of the past because the only inputs it has seen are from the past.

Using bidirectional will run your inputs in two ways, one from past to future and one from future to past and what differs this approach from unidirectional is that in the LSTM that runs backwards you preserve information from the future and using the two hidden states combined you are able in any point in time to preserve information from both past and future.

What they are suited for is a very complicated question but BiLSTMs show very good results as they can understand context better, I will try to explain through an example.

Lets say we try to predict the next word in a sentence, on a high level what a unidirectional LSTM will see is

The boys went to ....

And will try to predict the next word only by this context, with bidirectional LSTM you will be able to see information further down the road for example

Forward LSTM:

The boys went to ...

Backward LSTM:

... and then they got out of the pool

You can see that using the information from the future it could be easier for the network to understand what the next word is.

answered Sep 18 '22 13:09

bluesummers

Adding to Bluesummer's answer, here is how you would implement Bidirectional LSTM from scratch without calling BiLSTM module. This might better contrast the difference between a uni-directional and bi-directional LSTMs. As you see, we merge two LSTMs to create a bidirectional LSTM.

You can merge outputs of the forward and backward LSTMs by using either {'sum', 'mul', 'concat', 'ave'}.

left = Sequential() left.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',                forget_bias_init='one', return_sequences=True, activation='tanh',                inner_activation='sigmoid', input_shape=(99, 13))) right = Sequential() right.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',                forget_bias_init='one', return_sequences=True, activation='tanh',                inner_activation='sigmoid', input_shape=(99, 13), go_backwards=True))  model = Sequential() model.add(Merge([left, right], mode='sum'))  model.add(TimeDistributedDense(nb_classes)) model.add(Activation('softmax'))  sgd = SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True) model.compile(loss='categorical_crossentropy', optimizer=sgd) print("Train...") model.fit([X_train, X_train], Y_train, batch_size=1, nb_epoch=nb_epoches, validation_data=([X_test, X_test], Y_test), verbose=1, show_accuracy=True)

answered Sep 21 '22 13:09

aerin

Related questions
                            
                                What is out of bag error in Random Forests? [closed]
                            
                                Pattern recognition in time series [closed]
                            
                                How to get most informative features for scikit-learn classifiers?
                            
                                Mixing categorial and continuous data in Naive Bayes classifier using scikit-learn
                            
                                why gradient descent when we can solve linear regression analytically
                            
                                Adding L1/L2 regularization in PyTorch?
                            
                                What is the difference between labeled and unlabeled data?
                            
                                Instance Normalisation vs Batch normalisation
                            
                                What are the major differences and benefits of Porter and Lancaster Stemming algorithms? [closed]
                            
                                Estimating the number of neurons and number of layers of an artificial neural network [closed]
                            
                                Extracting an information from web page by machine learning
                            
                                How to save final model using keras?
                            
                                Batch Normalization in Convolutional Neural Network
                            
                                What is inductive bias in machine learning? [closed]
                            
                                What is the relation between the number of Support Vectors and training data and classifiers performance? [closed]
                            
                                How to update the bias in neural network backpropagation?
                            
                                What's the difference between torch.stack() and torch.cat() functions?
                            
                                How to detect patterns in (electrocardiography) waves?
                            
                                How to write a confusion matrix in Python?
                            
                                How big should batch size and number of epochs be when fitting a model in Keras?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the difference between a bidirectional LSTM and an LSTM?

Tags:

machine-learning

neural-network

keras

lstm

recurrent-neural-network

shekit

People also ask

2 Answers

bluesummers

aerin

Recent Activity

Donate For Us