I have seen both terms used while reading papers about BERT and ELMo so I wonder if there is a difference between them.

<ul> <li>A contextualized word embeding is a vector representing a word in a special context. The traditional word embeddings such as Word2Vec and GloVe generate one vector for each word, whereas a contextualized word embedding generates a vector for a word depending on the context. Consider the sentences <code>The duck is swimming</code>and <code>You shall duck when someone shoots at you</code>. With traditional word embeddings, the word vector for <code>duck</code>would be the same in both sentences, whereas it should be a different one in the contextualized case. </li> <li>While word embeddings encode words into a vector representation, there is also the question on how to represent a whole sentence in a way a computer can easily work with. These sentence encodings can embedd a whole sentence as one vector , doc2vec for example which generate a vector for a sentence. But also BERT generates a representation for the whole sentence, the [CLS]-token. </li> </ul> So in short, a conextualized word embedding represents a word in a context, whereas a sentence encoding represents a whole sentence.

What is the difference between Sentence Encodings and Contextualized Word Embeddings?

1 Answers

A contextualized word embeding is a vector representing a word in a special context. The traditional word embeddings such as Word2Vec and GloVe generate one vector for each word, whereas a contextualized word embedding generates a vector for a word depending on the context. Consider the sentences The duck is swimmingand You shall duck when someone shoots at you. With traditional word embeddings, the word vector for duckwould be the same in both sentences, whereas it should be a different one in the contextualized case.
While word embeddings encode words into a vector representation, there is also the question on how to represent a whole sentence in a way a computer can easily work with. These sentence encodings can embedd a whole sentence as one vector , doc2vec for example which generate a vector for a sentence. But also BERT generates a representation for the whole sentence, the [CLS]-token.

So in short, a conextualized word embedding represents a word in a context, whereas a sentence encoding represents a whole sentence.

129

answered Sep 23 '22 16:09

chefhose

Related questions
                            
                                Detect if text in English with python [closed]
                            
                                Algorithms or data structures for dealing with ambiguity
                            
                                Python NLTK WUP Similarity Score not unity for exact same word
                            
                                Python - calculate the co-occurrence matrix
                            
                                What should be the word vectors of token <pad>, <unknown>, <go>, <EOS> before sent into RNN?
                            
                                Is there a way to programmatically combine Korean unicode into one?
                            
                                Text classification using Keras: How to add custom features?
                            
                                Embedding 3D data in Pytorch
                            
                                'No module named spacy' in ipython, but works fine in regular python interpretter
                            
                                How can I untokenize a spacy.tokens.token.Token?
                            
                                Named Entity Recognition in aspect-opinion extraction using dependency rule matching
                            
                                Extracting Country Name from Author Affiliations
                            
                                NLTK/NLP buliding a many-to-many/multi-label subject classifier
                            
                                Realtime tracking of top 100 twitter words per min/hour/day
                            
                                Disease named entity recognition
                            
                                SPARQL queries with relational operator
                            
                                Using counts and tfidf as features with scikit learn
                            
                                Load Custom Dataset (which is like 20 news group set) in Scikit for Classification of text documents
                            
                                Python NLP British English vs American English
                            
                                How to evaluate Word2Vec model

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the difference between Sentence Encodings and Contextualized Word Embeddings?

Tags:

nlp

word-embedding

bert-language-model

elmo

Rodrigo

People also ask

1 Answers

chefhose

Recent Activity

Donate For Us