Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gensim saved dictionary has no id2token

Tags:

python

nlp

gensim

I have saved a Gensim dictionary to disk. When I load it, the id2token attribute dict is not populated.

A simple piece of the code that saves the dictionary:

dictionary = corpora.Dictionary(tag_docs)
dictionary.save("tag_dictionary_lda.pkl")

Now when I load it (I'm loading it in an jupyter notebook), it still works fine for mapping tokens to IDs, but id2token does not work (I cannot map IDs to tokens) and in fact id2token is not populated at all.

> dictionary = corpora.Dictionary.load("../data/tag_dictionary_lda.pkl")
> dictionary.token2id["love"]
Out: 1613

> dictionary.doc2bow(["love"])
Out: [(1613, 1)]

> dictionary.id2token[1613]
Out: 
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input> in <module>()
----> 1 dictionary.id2token[1613]

KeyError: 1613

> list(dictionary.id2token.keys())
Out: []

Any thoughts?

like image 342
cjrieds Avatar asked May 09 '17 19:05

cjrieds


1 Answers

You don't need the dictionary.id2token[1613] as you can use dictionary[1613] directly.

Note, that if you check the dictionary.id2token afterwards, it won't be empty any more. That's because the dictionary.id2token is formed only on request to save memory (as is stated during the init of Dictionary class).

like image 153
Lenka Vraná Avatar answered Sep 19 '22 14:09

Lenka Vraná