Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any way to match Gensim LDA output with topics in pyLDAvis graph?

I need to process the topics in the LDA output (lda.show_topics(num_topics=-1, num_words=100...) and then compare what I do with the pyLDAvis graph but the topic numbers are differently numbered. Is there a way I can match them?

like image 885
m.khalil Avatar asked Apr 06 '17 15:04

m.khalil


People also ask

How do you visualize LDA topics?

A latent Dirichlet allocation (LDA) model is a topic model which discovers underlying topics in a collection of documents and infers word probabilities in topics. You can visualize the LDA topics using word clouds by displaying words with their corresponding topic word probabilities.

How do I know how many topics for LDA?

To decide on a suitable number of topics, you can compare the goodness-of-fit of LDA models fit with varying numbers of topics. You can evaluate the goodness-of-fit of an LDA model by calculating the perplexity of a held-out set of documents. The perplexity indicates how well the model describes a set of documents.

How do you evaluate a LDA model?

LDA is typically evaluated by either measuring perfor- mance on some secondary task, such as document clas- sification or information retrieval, or by estimating the probability of unseen held-out documents given some training documents.

What is the optimal number of topics for LDA in Python?

How to find optimum number of topics ? One approach to find optimum number of topics is build many LDA models with different values of number of topics and pick the one that gives highest coherence value. If you see the same keywords being repeated in multiple topics, it's probably a sign that the 'k' is too large.


1 Answers

If it's still relevant, have a look at the documentation http://pyldavis.readthedocs.io/en/latest/modules/API.html

You may want to set sort_topics to False. This way the order of topics in gensim and pyLDAvis will be the same.

At the same time, gensim's indexing starts from 0, while pyLDAvis displays topics starting from 1. Not sure if there's a straightforward way to address this.

like image 177
formi23 Avatar answered Oct 19 '22 21:10

formi23