Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between Luong attention and Bahdanau attention?

These two attentions are used in seq2seq modules. The two different attentions are introduced as multiplicative and additive attentions in this TensorFlow documentation. What is the difference?

like image 360
Shamane Siriwardhana Avatar asked May 29 '17 08:05

Shamane Siriwardhana


People also ask

What is Bahdanau attention?

Bahdanau attention mechanism proposed an attention mechanism that learns to align and translate jointly. It is also known as Additive attention as it performs a linear combination of encoder states and the decoder states.

What is difference between attention and self attention?

The attention mechanism allows output to focus attention on input while producing output while the self-attention model allows inputs to interact with each other (i.e calculate attention of all other inputs wrt one input.

How are attention scores computed in Luong's method?

An attentional hidden state is computed based on a weighted concatenation of the context vector and the current decoder hidden state: $\widetilde{\mathbf{s}}_t = \tanh(\mathbf{W_c} [\mathbf{c}_t \; ; \; \mathbf{s}_t])$.

What is Self attention?

Self Attention, also called intra Attention, is an attention mechanism relating different positions of a single sequence in order to compute a representation of the same sequence. It has been shown to be very useful in machine reading, abstractive summarization, or image description generation.


1 Answers

I went through this Effective Approaches to Attention-based Neural Machine Translation. In the section 3.1 They have mentioned the difference between two attentions as follows,

  1. Luong attention used top hidden layer states in both of encoder and decoder. But Bahdanau attention take concatenation of forward and backward source hidden state (Top Hidden Layer).

  2. In Luong attention they get the decoder hidden state at time t. Then calculate attention scores and from that get the context vector which will be concatenated with hidden state of the decoder and then predict.

    But in the Bahdanau at time t we consider about t-1 hidden state of the decoder. Then we calculate alignment , context vectors as above. But then we concatenate this context with hidden state of the decoder at t-1. So before the softmax this concatenated vector goes inside a GRU.

  3. Luong has diffferent types of alignments. Bahdanau has only concat score alignment model.

Alignment methdods

like image 198
Shamane Siriwardhana Avatar answered Sep 28 '22 05:09

Shamane Siriwardhana