Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the difference between pooled output and sequence output in bert layer?

everyone! I was reading about Bert and wanted to do text classification with its word embeddings. I came across this line of code:

pooled_output, sequence_output = self.bert_layer([input_word_ids, input_mask, segment_ids])   

and then:

clf_output = sequence_output[:, 0, :]
out = Dense(1, activation='sigmoid')(clf_output)

But I can't understand the use of pooled output. Doesn't sequence output contain all the information including the word embedding of ['CLS']? If so, why do we have pooled output?

Thanks in advance!

like image 621
mitra mirshafiee Avatar asked Sep 18 '25 17:09

mitra mirshafiee


1 Answers

Sequence output is the sequence of hidden-states (embeddings) at the output of the last layer of the BERT model. It includes the embedding of the [CLS] token. Hence, for the sentence "You are on Stackoverflow", it gives 5 embeddings: one embedding for each of the four words (assuming the word "Stackoverflow" was tokenized into a single token) along with the embedding of the [CLS] token. Pooled output is the embedding of the [CLS] token (from Sequence output), further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pretraining. For further details, please refer to the BERT original paper.

like image 135
Mustafizur Shahid Avatar answered Sep 21 '25 09:09

Mustafizur Shahid