Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any efficient python libraries for Dynamic Topic Models, preferably extending Gensim?

I'm trying to model twitter stream data with topic models. Gensim, being an easy to use solution, is impressive in it's simplicity. It has a truly online implementation for LSI, but not for LDA. For a changing content stream like twitter, Dynamic Topic Models are ideal. Is there any way, or even a hack - an implementation or even a strategy, using which I can utilize Gensim for this purpose?

Are there any other python implementations which derive (preferably) from Gensim or independent? I am preferring python, since I want to get started asap, but if there is an optimum solution with some work, please mention it.

Thanks.

like image 612
Ravi Karan Avatar asked Mar 18 '14 02:03

Ravi Karan


2 Answers

Gensim (http://radimrehurek.com/gensim/models/dtmmodel.html) has a python wrapper for the orig. C++ code.

like image 126
Daki Avatar answered Oct 17 '22 22:10

Daki


The DTM wrapper in Gensim is working, but none of the documentation is particularly complete at this time. On the Gensim side, the most useful thing to look at is the DTM example buried in docs/notebooks. This shows you what all of the input variables need to look like. A couple of things to note:

  • the DTM model has been moved into gensim.models.wrappers.dtmmodel
  • initialize_lda=True must be set because of a bug in the DTM code (this will be the default in future -- PR #676)

You'll also need a working compiled version of DTM itself (you provide the path to that executable). You can try using the appropriate executable from a github repo, but if that doesn't work you'll probably need to compile the original code by running the included makefile.

like image 3
snl Avatar answered Oct 17 '22 22:10

snl