Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gensim word2vec accessing in/out vectors

Tags:

python

gensim

In the word2vec model, there are two linear transforms that take a word in vocab space to a hidden layer (the "in" vector), and then back to the vocab space (the "out" vector). Usually this out vector is discarded after training. I'm wondering if there's an easy way of accessing the out vector in gensim python? Equivalently, how can I access the out matrix?

Motivation: I would like to implement the ideas presented in this recent paper: A Dual Embedding Space Model for Document Ranking

Here are more details. From the reference above we have the following word2vec model:

enter image description here

Here, the input layer is of size $V$, the vocabulary size, the hidden layer is of size $d$, and an output layer of size $V$. The two matrices are W_{IN} and W_{OUT}. Usually, the word2vec model keeps only the W_IN matrix. This is what is returned where, after training a word2vec model in gensim, you get stuff like:

model['potato']=[-0.2,0.5,2,...]

How can I access, or retain W_{OUT}? This is likely quite computationally expensive, and I'm really hoping for some built in methods in gensim to do this because I'm afraid that if I code this from scratch, it would not give good performance.

like image 617
Alex R. Avatar asked Nov 07 '16 06:11

Alex R.


2 Answers

While this might not be a proper answer (can't comment yet) and noone pointed this out, take a look here. The creator seems to answer a similar question. Also that's the place where you have a higher chance for a valid answer.

Digging around in the link he posted in the word2vec source code you could change the syn1 deletion to suit your needs. Just remember to delete it after you're done, since it proves to be a memory hog.

like image 72
themistoklik Avatar answered Nov 14 '22 03:11

themistoklik


To get the syn1 of any word, this might work.

model.syn1[model.wv.vocab['potato'].point]

where model is your trained word2vec model.

like image 39
Kim Jay Avatar answered Nov 14 '22 03:11

Kim Jay