I have check the previous post link but it doesn't seems to work for my case:- I have pre trained word2vec model: <pre class="prettyprint"><code>import gensim model = Word2Vec.load('w2v_model') </code></pre> Now I have a pandas dataframe with keywords: <pre class="prettyprint"><code>keyword corruption people budget cambodia ....... ...... </code></pre> All I want to add the vectors for each keyword in its corresponding columns but when I use <code>model['cambodia']</code> it throw me error as <code>KeyError: "word 'cambodia' not in vocabulary"</code> so I have update the keyword as: <pre class="prettyprint"><code>model.train(['cambodia']) </code></pre> But this won't work out for me, when I use <code>model['cambodia']</code> it still giving an error as <code>KeyError: "word 'cambodia' not in vocabulary"</code>. How to update new words into word2vec vocabulary so i can get its vectors? Expected output will be:- <pre class="prettyprint"><code>keyword V1 V2 V3 V4 V5 V6 corruption 0.07397 0.290874 -0.170812 0.085428 -0.148551 0.38846 people .............................................................. budget ........................................................... </code></pre>

You can initial the first vector as [0,0,...0]. And the word that not in vocabulary can set to 0. <pre class="prettyprint"><code>keyword V1 V2 V3 V4 V5 V6 0 0 0 0 0 0 0 1 0.07397 0.290874 -0.170812 0.085428 -0.148551 0.38846 2 .............................................................. 3 ........................................................... </code></pre> You can use two dicts to solve the problem. <pre class="prettyprint"><code>word2id['corruption']=1 vec['corruption']=[0.07397 0.290874 -0.170812 0.085428 -0.148551 0.38846] ... word2id['cambodia']=0 vec['cambodia']=[0 0 0 0 0 0] </code></pre>

How I can get the vectors for words that were not present in word2vec vocabulary?

Tags:

python-3.x

pandas

gensim

word2vec

text-classification

I have check the previous post link but it doesn't seems to work for my case:-

I have pre trained word2vec model:

import gensim    
model = Word2Vec.load('w2v_model')

Now I have a pandas dataframe with keywords:

keyword
corruption
people
budget
cambodia
.......
......

All I want to add the vectors for each keyword in its corresponding columns but when I use model['cambodia'] it throw me error as KeyError: "word 'cambodia' not in vocabulary"

so I have update the keyword as:

model.train(['cambodia'])

But this won't work out for me, when I use model['cambodia']

it still giving an error as KeyError: "word 'cambodia' not in vocabulary". How to update new words into word2vec vocabulary so i can get its vectors? Expected output will be:-

keyword    V1         V2          V3         V4            V5         V6   
corruption 0.07397  0.290874    -0.170812   0.085428    -0.148551   0.38846 
people      ..............................................................
budget      ...........................................................

861

asked Jul 04 '18 07:07

James

1 Answers

You can initial the first vector as [0,0,...0]. And the word that not in vocabulary can set to 0.

keyword    V1         V2          V3         V4            V5         V6  
0          0          0           0           0           0           0
1       0.07397  0.290874    -0.170812   0.085428    -0.148551   0.38846 
2      ..............................................................
3      ...........................................................

You can use two dicts to solve the problem.

word2id['corruption']=1 
vec['corruption']=[0.07397 0.290874 -0.170812 0.085428 -0.148551 0.38846]
 ...
word2id['cambodia']=0 
vec['cambodia']=[0 0 0 0 0 0]

197

answered Oct 20 '22 19:10

Wei Chen

Related questions
                            
                                MySQL "delayed commit ok done" and performance
                            
                                Token expire before file was uploaded
                            
                                How to import one submodule from different submodule? [duplicate]
                            
                                PIP install fails in python3 venv with permission denied in /tmp folder
                            
                                Tensorflow Object detection with multiple camera
                            
                                Tensorflow ResourceExhaustedError after first batch
                            
                                Python3 - how to correctly do absolute imports and make Pylint happy
                            
                                invite user by username to telegram channel
                            
                                Why does importing numpy add 1 GB of virtual memory on Linux?
                            
                                Algorithm to get minimum movement to avoid square overlap
                            
                                Safely bind method from one class to another class in Python [duplicate]
                            
                                Can I use the secrets module with a version of Python earlier than 3.6?
                            
                                df.append() with dicts converts booleans to 1s and 0s
                            
                                How do chr() and ord() relate to str and bytes?
                            
                                Python: error handling with recursive function in error
                            
                                "Could not find a version that satisfies the requirement" error for Django2 app installation
                            
                                How to access country restricted website through proxy selenium in python
                            
                                Getting Multiple Last Price Quotes from Interactive Brokers's API
                            
                                Model Output `to_excel` in Python?
                            
                                How to plot heatmap for high-dimensional dataset?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With