I'm trying to model twitter stream data with topic models. Gensim, being an easy to use solution, is impressive in it's simplicity. It has a truly online implementation for LSI, but not for LDA. For a changing content stream like twitter, Dynamic Topic Models are ideal. Is there any way, or even a hack - an implementation or even a strategy, using which I can utilize Gensim for this purpose?
Are there any other python implementations which derive (preferably) from Gensim or independent? I am preferring python, since I want to get started asap, but if there is an optimum solution with some work, please mention it.
Thanks.
Gensim (http://radimrehurek.com/gensim/models/dtmmodel.html) has a python wrapper for the orig. C++ code.
The DTM wrapper in Gensim is working, but none of the documentation is particularly complete at this time. On the Gensim side, the most useful thing to look at is the DTM example buried in docs/notebooks
. This shows you what all of the input variables need to look like. A couple of things to note:
gensim.models.wrappers.dtmmodel
initialize_lda=True
must be set because of a bug in the DTM code (this will be the default in future -- PR #676)You'll also need a working compiled version of DTM itself (you provide the path to that executable). You can try using the appropriate executable from a github repo, but if that doesn't work you'll probably need to compile the original code by running the included makefile
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With