Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Long Sequence In a seq2seq model with attention?

I am following along this pytorch tutorial and trying to apply this principle to summarization, where the encoding sequence would be around 1000 words and decoder target 200 words.

How do I apply seq2seq to this? I know it would be very expensive and almost infeasible to run through the whole sequence of 1000 words at once. So dividing the seq into say 20 seq and running in parallel could be an answer. But I'm not sure how to implement it; I also want to incorporate attention into it.

like image 805
vijendra rana Avatar asked Jun 04 '17 05:06

vijendra rana


People also ask

What is the attention mechanism in the Seq2Seq model?

Seq2Seq model with an attention mechanism consists of an encoder, decoder, and attention layer. The decoder decides which part of the source sentence it needs to pay attention to, instead of having encoder encode all the information of the source sentence into a fixed-length vector.

How large is the context matrix in an attention Seq2Seq model?

The encoder processes all the inputs by transforming them into a single vector, called context (usually with a length of 256, 512, or 1024). The context contains all the information that the encoder was able to detect from the input (remember that the input is the sentence to be translated in this case).

What is attention in RNNs?

Attention is a mechanism combined in the RNN allowing it to focus on certain parts of the input sequence when predicting a certain part of the output sequence, enabling easier learning and of higher quality.

What is attention in encoder decoder?

Attention is a powerful mechanism developed to enhance the performance of the Encoder-Decoder architecture on neural network-based machine translation tasks.


1 Answers

You can not parallelize RNN in time (1000 here) because they are inherently sequential.

You can use a light RNN, something like QRNN or SRU as a faster alternative(which is still sequential).

Another common sequence processing modules are TCN and Transformers which are both parallelizable in time.

Also, note that all of them can be used with attention and work perfectly fine with text.

like image 75
Separius Avatar answered Oct 13 '22 12:10

Separius