Retrieve string version of document by ID in Gensim

Question

I am using Gensim for some topic modelling and I have gotten to the point where I am doing similarity queries using the LSI and tf-idf models. I get back the set of IDs and similarities, eg. (299501, 0.64505910873413086).

How do I get the text document that is related to the ID, in this case 299501?

I have looked at the docs for corpus, dictionary, index, and the model and cannot seem to find it.

Jason · Accepted Answer

Sadly, as far as I can tell, you have to start from the very beginning of the analysis knowing that you'll want to retrieve documents by the ids. This means you need to create your own mapping between ids and the original documents and make sure the ids gensim uses are preserved throughout the process. As is, I don't think gensim keeps such a mapping handy.

I could definitely be wrong, and in fact I'd love it if someone tells me there is an easier way, but I spent many hours trying to avoid re-running a gigantic LSI model on a wikipedia corpus to no avail. Eventually I had to carry along a list of ids and the associated documents so I could use gensim's output.

Retrieve string version of document by ID in Gensim

Tags:

python

gensim

jisaw

1 Answers

Jason

Recent Activity

Donate For Us

Retrieve string version of document by ID in Gensim

Tags:

python

gensim

jisaw

1 Answers

Jason

Related questions

Recent Activity

Donate For Us