Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to determine the number of topics for LDA?

Tags:

I am a freshman in LDA and I want to use it in my work. However, some problems appear.

In order to get the best performance, I want to estimate the best topic number. After reading "Finding Scientific topics", I know that I can calculate logP(w|z) firstly and then use the harmonic mean of a series of P(w|z) to estimate P(w|T).

My question is what does the "a series of" mean?

like image 742
Chelsea Wang Avatar asked Jul 02 '13 09:07

Chelsea Wang


People also ask

How do you pick the number of topics K When you run a LDA topic model?

My approach to finding the optimal number of topics is to build many LDA models with different values of number of topics (k) and pick the one that gives the highest coherence value. Choosing a 'k' that marks the end of a rapid growth of topic coherence usually offers meaningful and interpretable topics.

How many topics is best in terms of perplexity?

From Table 3, it can be seen that the optimal number of topics selected by the perplexity method is four.


2 Answers

Unfortunately, there is no hard science yielding the correct answer to your question. To the best of my knowledge, hierarchical dirichlet process (HDP) is quite possibly the best way to arrive at the optimal number of topics.

If you are looking for deeper analyses, this paper on HDP reports the advantages of HDP in determining the number of groups.

like image 146
Chthonic Project Avatar answered Oct 03 '22 10:10

Chthonic Project


A reliable way is to compute the topic coherence for different number of topics and choose the model that gives the highest topic coherence. But sometimes, the highest may not always fit the bill.

enter image description here

See this topic modeling example.

like image 41
Selva Avatar answered Oct 03 '22 11:10

Selva