from gensim.models import word2vec
sentences = word2vec.Text8Corpus('TextFile')
model = word2vec.Word2Vec(sentences, size=200, min_count = 2, workers = 4)
print model['king']
Is the output vector the context vector of 'king' or the word embedding vector of 'king'? How can I get both context vector of 'king' and the word embedding vector of 'king'? Thanks!
Word embeddings can be generated using various methods like neural networks, co-occurrence matrix, probabilistic models, etc. Word2Vec consists of models for generating word embedding. These models are shallow two-layer neural networks having one input layer, one hidden layer, and one output layer.
It is capable of capturing context of a word in a document, semantic and syntactic similarity, relation with other words, etc.
It is often stated that word2vec and GloVe are non-contextual embeddings while LSTM and Transformer-based (e.g. BERT) embeddings are contextual.
CBOW (continuous bag of words) and the skip-gram model are the two main architectures associated with word2vec. Given an input word, skip-gram will try to predict the words in context to the input whereas the CBOW model will take a variety of words and try to predict the missing one.
It is the embedding vector for 'king'.
If you use hierarchical softmax, the context vectors are:
model.syn1
and if you use negative sampling they are:
model.syn1neg
The vectors can be accessed by:
model.syn1[model.vocab[word].index]
'Context vector' is also a 'word embedding' vector. Word embedding means how vocabulary are mapped to vectors of real numbers.
I assume you meant center word's vector when you said 'word embedding' vector.
In word2vec algorithm, when you train the model, it creates two different vectors for one word (when 'king' is used for center word and when it's used for context words.)
I don't know about how gensim is treating these two vectors, but normally, people average both context and center words, or concatinate two vectors. It might not be the most beautiful way to treat the vectors, but it works very well that way.
So when you call model['king'] on some pre-trained vector, the vector you see is probably the averaged version of two vectors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With