<pre class="prettyprint"><code>from gensim.models import word2vec sentences = word2vec.Text8Corpus('TextFile') model = word2vec.Word2Vec(sentences, size=200, min_count = 2, workers = 4) print model['king'] </code></pre> Is the output vector the context vector of 'king' or the word embedding vector of 'king'? How can I get both context vector of 'king' and the word embedding vector of 'king'? Thanks!

It is the embedding vector for 'king'. If you use hierarchical softmax, the context vectors are: <pre class="prettyprint"><code>model.syn1 </code></pre> and if you use negative sampling they are: <pre class="prettyprint"><code>model.syn1neg </code></pre> The vectors can be accessed by: <pre class="prettyprint"><code>model.syn1[model.vocab[word].index] </code></pre>

'Context vector' is also a 'word embedding' vector. Word embedding means how vocabulary are mapped to vectors of real numbers. I assume you meant center word's vector when you said 'word embedding' vector. In word2vec algorithm, when you train the model, it creates two different vectors for one word (when 'king' is used for center word and when it's used for context words.) I don't know about how gensim is treating these two vectors, but normally, people average both context and center words, or concatinate two vectors. It might not be the most beautiful way to treat the vectors, but it works very well that way. So when you call model['king'] on some pre-trained vector, the vector you see is probably the averaged version of two vectors.

How to get both the word embeddings vector and context vector of a given word by using word2vec?

Tags:

python

vector

word-embedding

word2vec

from gensim.models import word2vec

sentences = word2vec.Text8Corpus('TextFile')
model = word2vec.Word2Vec(sentences, size=200, min_count = 2, workers = 4)
print model['king']

Is the output vector the context vector of 'king' or the word embedding vector of 'king'? How can I get both context vector of 'king' and the word embedding vector of 'king'? Thanks!

472

asked Sep 09 '16 07:09

cai

2 Answers

It is the embedding vector for 'king'.

If you use hierarchical softmax, the context vectors are:

model.syn1

and if you use negative sampling they are:

model.syn1neg

The vectors can be accessed by:

model.syn1[model.vocab[word].index]

138

answered Nov 02 '22 01:11

Raphael Schumann

'Context vector' is also a 'word embedding' vector. Word embedding means how vocabulary are mapped to vectors of real numbers.

I assume you meant center word's vector when you said 'word embedding' vector.

In word2vec algorithm, when you train the model, it creates two different vectors for one word (when 'king' is used for center word and when it's used for context words.)

I don't know about how gensim is treating these two vectors, but normally, people average both context and center words, or concatinate two vectors. It might not be the most beautiful way to treat the vectors, but it works very well that way.

So when you call model['king'] on some pre-trained vector, the vector you see is probably the averaged version of two vectors.

answered Nov 02 '22 02:11

aerin

Related questions
                            
                                Mysterious "embedded null byte" error
                            
                                How to Access Spark PipelineModel Parameters
                            
                                Splitting a network into subnets of multiple prefixes
                            
                                Django tutorial part 3 - NoReverseMatch at /polls/
                            
                                Slack API: Do Something when button is clicked
                            
                                Python Pandas Groupby Resetting Values Based on Index
                            
                                Settings for timedata in seaborn FacetGrid plots
                            
                                Set WTForms submit button to icon
                            
                                pandas: how to do multiple groupby-apply operations
                            
                                Scala MurmurHash3 library not matching Python mmh3 library
                            
                                Python function that takes one compulsory argument from two choices
                            
                                Dynamically link a Span and a Slider in a python bokeh plot
                            
                                How to import a local python module when using the sbatch command in SLURM
                            
                                How to size my imshow?
                            
                                UTF-16 codepoint counting in python
                            
                                Is it possible to break from lambda when the expected result is found
                            
                                Numpy Array Set Difference [duplicate]
                            
                                How to define a temporary variable in python?
                            
                                How to write conditional code that's compatible with both plain Python values and NumPy arrays?
                            
                                SQLAlchemy - AttributeError: _reverse_property

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With