How to get vocabulary word count from gensim word2vec?

Question

I am using gensim word2vec package in python. I know how to get the vocabulary from the trained model. But how to get the word count for each word in vocabulary?

user3390629 · Accepted Answer

Each word in the vocabulary has an associated vocabulary object, which contains an index and a count.

vocab_obj = w2v.vocab["word"]
vocab_obj.count

Output for google news w2v model: 2998437

So to get the count for each word, you would iterate over all words and vocab objects in the vocabulary.

for word, vocab_obj in w2v.vocab.items():
  #Do something with vocab_obj.count

Ahmedov · Answer

When you want to create a dictionary of word to count for easy retrieval later, you can do so as follows:

w2c = dict()
for item in model.wv.vocab:
    w2c[item]=model.wv.vocab[item].count

If you want to sort it to see the most frequent words in the model, you can also do that so:

w2cSorted=dict(sorted(w2c.items(), key=lambda x: x[1],reverse=True))

How to get vocabulary word count from gensim word2vec?

Tags:

gensim

word2vec

Michelle Owen

2 Answers

user3390629

Ahmedov

Recent Activity

Donate For Us

How to get vocabulary word count from gensim word2vec?

Tags:

gensim

word2vec

Michelle Owen

2 Answers

user3390629

Ahmedov

Related questions

Recent Activity

Donate For Us