Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do Transformers in Natural Language Processing need a stack of encoders?

I am following this blog on transformers

http://jalammar.github.io/illustrated-transformer/

The only thing I don't understand is why there needs to be a stack of encoders or decoders. I understand that the multi-headed attention layers capture different representation spaces of the problem. I don't understand why there needs to be a vertical stack of encoders and decoders. Wouldn't one encoder/decoder layer work?

like image 759
somethingstrang Avatar asked Dec 18 '19 00:12

somethingstrang


1 Answers

Stacking layer is what makes any deep learning architecture powerful, using a single encoder/decoder with attention wouldn't be able to capture the complexity needed to model an entire language or archive high accuracy on tasks as complex as language translation, the use of stacks of encoder/decoders allows the network to extract hierarchical features and model complex problems.

like image 153
ESDAIRIM Avatar answered Sep 28 '22 06:09

ESDAIRIM