These two attentions are used in seq2seq modules. The two different attentions are introduced as multiplicative and additive attentions in this TensorFlow documentation. What is the difference?

I went through this Effective Approaches to Attention-based Neural Machine Translation. In the section 3.1 They have mentioned the difference between two attentions as follows, <ol> <li> Luong attention used top hidden layer states in both of encoder and decoder. But Bahdanau attention take concatenation of forward and backward source hidden state (Top Hidden Layer). </li> <li> In Luong attention they get the decoder hidden state at time t. Then calculate attention scores and from that get the context vector which will be concatenated with hidden state of the decoder and then predict. But in the Bahdanau at time t we consider about t-1 hidden state of the decoder. Then we calculate alignment , context vectors as above. But then we concatenate this context with hidden state of the decoder at t-1. So before the softmax this concatenated vector goes inside a GRU. </li> <li> Luong has diffferent types of alignments. Bahdanau has only concat score alignment model. </li> </ol> <img src="https://i.stack.imgur.com/tiQkz.png" alt="Alignment methdods">

What is the difference between Luong attention and Bahdanau attention?

1 Answers

I went through this Effective Approaches to Attention-based Neural Machine Translation. In the section 3.1 They have mentioned the difference between two attentions as follows,

Luong attention used top hidden layer states in both of encoder and decoder. But Bahdanau attention take concatenation of forward and backward source hidden state (Top Hidden Layer).
In Luong attention they get the decoder hidden state at time t. Then calculate attention scores and from that get the context vector which will be concatenated with hidden state of the decoder and then predict.

But in the Bahdanau at time t we consider about t-1 hidden state of the decoder. Then we calculate alignment , context vectors as above. But then we concatenate this context with hidden state of the decoder at t-1. So before the softmax this concatenated vector goes inside a GRU.
Luong has diffferent types of alignments. Bahdanau has only concat score alignment model.

Alignment methdods

198

answered Sep 28 '22 05:09

Shamane Siriwardhana

Related questions
                            
                                Tensorflow Confusion Matrix in TensorBoard
                            
                                AttributeError: module 'tensorflow.python.keras.utils.generic_utils' has no attribute 'populate_dict_with_module_objects'
                            
                                Installing TensorFlow on Windows (Python 3.6.x)
                            
                                You must feed a value for placeholder tensor 'Placeholder' with dtype float
                            
                                Training on imbalanced data using TensorFlow
                            
                                Hyperparameter optimization for Deep Learning Structures using Bayesian Optimization
                            
                                Building a mutlivariate, multi-task LSTM with Keras
                            
                                Tensorflow: Can't understand ctc_beam_search_decoder() output sequence
                            
                                TensorFlow: Remember LSTM state for next batch (stateful LSTM)
                            
                                Nvidia Cudatoolkit vs Conda Cudatoolkit
                            
                                How does TensorFlow's MultiRnnCell work?
                            
                                Tensorflow : What is the relationship between .ckpt file and .ckpt.meta and .ckpt.index , and .pb file
                            
                                Cannot import keras after installation
                            
                                Tensor is not an element of this graph
                            
                                What is the equivalent of np.std() in TensorFlow?
                            
                                How can I clear a model created with Keras and Tensorflow(as backend)?
                            
                                Is there a better way to guess possible unknown variables without brute force than I am doing? Machine learning? [duplicate]
                            
                                Tensorflow: loss decreasing, but accuracy stable
                            
                                How to suppress verbose Tensorflow logging? [duplicate]
                            
                                Flatten batch in tensorflow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between Luong attention and Bahdanau attention?

Tags:

tensorflow

deep-learning

nlp

attention-model

Shamane Siriwardhana

People also ask

1 Answers

Shamane Siriwardhana

Recent Activity

Donate For Us