I have seen both terms used while reading papers about BERT and ELMo so I wonder if there is a difference between them.
Contextualised words embeddings aim at capturing word semantics in different contexts to address the issue of polysemous and the context-dependent nature of words.
The word embedding techniques are used to represent words mathematically. One Hot Encoding, TF-IDF, Word2Vec, FastText are frequently used Word Embedding methods. One of these techniques (in some cases several) is preferred and used according to the status, size and purpose of processing the data.
The BERT base model uses 12 layers of transformer encoders as discussed, and each output per token from each layer of these can be used as a word embedding!.
Some key differences between TF-IDF and word2vec is that TF-IDF is a statistical measure that we can apply to terms in a document and then use that to form a vector whereas word2vec will produce a vector for a term and then more work may need to be done to convert that set of vectors into a singular vector or other ...
The duck is swimming
and You shall duck when someone shoots at you
. With traditional word embeddings, the word vector for duck
would be the same in both sentences, whereas it should be a different one in the contextualized case. So in short, a conextualized word embedding represents a word in a context, whereas a sentence encoding represents a whole sentence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With