what is the difference between pooled output and sequence output in bert layer?

Question

everyone! I was reading about Bert and wanted to do text classification with its word embeddings. I came across this line of code:

pooled_output, sequence_output = self.bert_layer([input_word_ids, input_mask, segment_ids])

and then:

clf_output = sequence_output[:, 0, :]
out = Dense(1, activation='sigmoid')(clf_output)

But I can't understand the use of pooled output. Doesn't sequence output contain all the information including the word embedding of ['CLS']? If so, why do we have pooled output?

Thanks in advance!

Mustafizur Shahid · Accepted Answer

Sequence output is the sequence of hidden-states (embeddings) at the output of the last layer of the BERT model. It includes the embedding of the [CLS] token. Hence, for the sentence "You are on Stackoverflow", it gives 5 embeddings: one embedding for each of the four words (assuming the word "Stackoverflow" was tokenized into a single token) along with the embedding of the [CLS] token. Pooled output is the embedding of the [CLS] token (from Sequence output), further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining. For further details, please refer to the BERT original paper.

what is the difference between pooled output and sequence output in bert layer?

Tags:

python-3.x

neural-network

tensorflow

text-classification

bert-language-model

mitra mirshafiee

1 Answers

Mustafizur Shahid

Recent Activity

Donate For Us

what is the difference between pooled output and sequence output in bert layer?

Tags:

python-3.x

neural-network

tensorflow

text-classification

bert-language-model

mitra mirshafiee

1 Answers

Mustafizur Shahid

Related questions

Recent Activity

Donate For Us