Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to print out the full distribution of words in an LDA topic in gensim?

The lda.show_topics module from the following code only prints the distribution of the top 10 words for each topic, how do i print out the full distribution of all the words in the corpus?

from gensim import corpora, models

documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]

stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
         for document in documents]

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

lda = models.ldamodel.LdaModel(corpus_tfidf, id2word=dictionary, num_topics=2)

for i in lda.show_topics():
    print i
like image 549
alvas Avatar asked Jul 15 '13 20:07

alvas


People also ask

How do you visualize LDA topics?

A latent Dirichlet allocation (LDA) model is a topic model which discovers underlying topics in a collection of documents and infers word probabilities in topics. You can visualize the LDA topics using word clouds by displaying words with their corresponding topic word probabilities.

What is the best way to obtain the optimal number of topics for a LDA model using Gensim?

A general rule of thumb is to create LDA models across different topic numbers, and then check the Jaccard similarity and coherence for each. Coherence in this case measures a single topic by the degree of semantic similarity between high scoring words in the topic (do these words co-occur across the text corpus).

What are the two main inputs to an LDA topic model using Gensim?

The two main inputs to the LDA topic model are the dictionary and the corpus. For example, (0, 1) above implies, for the first document word id 0 (word: 'able') occurs once.

What is passes in LDA Gensim?

Passes is the number of times you want to go through the entire corpus. Below are a few examples of different combinations of the 3 parameters and the number of online training updates which will occur while training LDA.


1 Answers

There is a variable call topn in show_topics() where you can specify the number of top N words you require from the words distribution over each topic. see http://radimrehurek.com/gensim/models/ldamodel.html

So instead of the default lda.show_topics(). You can use the len(dictionary) for the full word distributions for each topic:

for i in lda.show_topics(topn=len(dictionary)):
    print i
like image 88
alvas Avatar answered Sep 18 '22 07:09

alvas