Why do Transformers in Natural Language Processing need a stack of encoders?

Question

I am following this blog on transformers

http://jalammar.github.io/illustrated-transformer/

The only thing I don't understand is why there needs to be a stack of encoders or decoders. I understand that the multi-headed attention layers capture different representation spaces of the problem. I don't understand why there needs to be a vertical stack of encoders and decoders. Wouldn't one encoder/decoder layer work?

ESDAIRIM · Accepted Answer

Stacking layer is what makes any deep learning architecture powerful, using a single encoder/decoder with attention wouldn't be able to capture the complexity needed to model an entire language or archive high accuracy on tasks as complex as language translation, the use of stacks of encoder/decoders allows the network to extract hierarchical features and model complex problems.

Why do Transformers in Natural Language Processing need a stack of encoders?

Tags:

machine-learning

deep-learning

nlp

transformer

somethingstrang

1 Answers

ESDAIRIM

Recent Activity

Donate For Us

Why do Transformers in Natural Language Processing need a stack of encoders?

Tags:

machine-learning

deep-learning

nlp

transformer

somethingstrang

1 Answers

ESDAIRIM

Related questions

Recent Activity

Donate For Us