Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gensim: How to save LDA model's produced topics to a readable format (csv,txt,etc)?

Tags:

python

gensim

lda

last parts of the code:

lda = LdaModel(corpus=corpus,id2word=dictionary, num_topics=2)
print lda

bash output:

INFO : adding document #0 to Dictionary(0 unique tokens)
INFO : built Dictionary(18 unique tokens) from 5 documents (total  20 corpus positions)
INFO : using serial LDA version on this node
INFO : running online LDA training, 2 topics, 1 passes over the supplied corpus of 5 documents, updating model once every 5 documents
WARNING : too few updates, training might not converge; consider increasing the number of passes to improve accuracy
INFO : PROGRESS: iteration 0, at document #5/5
INFO : 2/5 documents converged within 50 iterations
INFO : topic #0: 0.079*cute + 0.076*broccoli + 0.070*adopted + 0.069*yesterday + 0.069*eat + 0.069*sister + 0.068*kitten + 0.068*kittens + 0.067*bananas + 0.067*chinchillas
INFO : topic #1: 0.082*broccoli + 0.079*cute + 0.071*piece + 0.070*munching + 0.069*spinach + 0.068*hamster + 0.068*ate + 0.067*banana + 0.066*breakfast + 0.066*smoothie
INFO : topic diff=0.470477, rho=1.000000
<gensim.models.ldamodel.LdaModel object at 0x10f1f4050>

So I'm wondering i'm able to save the resulting topics that it generated, to a readable format. I've tried the .save() methods, but it always outputs something unreadable.

like image 638
jeremy.ting Avatar asked Jun 27 '13 22:06

jeremy.ting


People also ask

How do you visualize LDA topics?

A latent Dirichlet allocation (LDA) model is a topic model which discovers underlying topics in a collection of documents and infers word probabilities in topics. You can visualize the LDA topics using word clouds by displaying words with their corresponding topic word probabilities.

What is LDA Gensim?

Latent Dirichlet Allocation(LDA) is a popular algorithm for topic modeling with excellent implementations in the Python's Gensim package. The challenge, however, is how to extract good quality of topics that are clear, segregated and meaningful.

What is Chunksize in LDA?

chunksize - number of documents to consider at once (affects the memory consumption) update_every - update the model every update_every chunksize chunks (essentially, this is for memory consumption optimization) passes - how many times the algorithm is supposed to pass over the whole corpus.


3 Answers

Here is how to save a model for gensim LDA:

from gensim import corpora, models, similarities

# create corpus and dictionary
corpus = ...
dictionary = ...

# train model, this might takes time
model = models.LdaModel.LdaModel(corpus=corpus,id2word=dictionary, num_topics=200,passes=5, alpha='auto')
# save model to disk (no need to use pickle module)
model.save('lda.model')

To print topics, here are a few ways:

# later on, load trained model from file
model =  models.LdaModel.load('lda.model')

# print all topics
model.show_topics(topics=200, topn=20)

# print topic 28
model.print_topic(109, topn=20)

# another way
for i in range(0, model.num_topics-1):
    print model.print_topic(i)

# and another way, only prints top words
for t in range(0, model.num_topics-1):
    print 'topic {}: '.format(t) + ', '.join([v[1] for v in model.show_topic(t, 20)])
like image 174
Renaud Avatar answered Sep 28 '22 09:09

Renaud


you just need to use lda.show_topics(topics=-1) or any number of topics you want to have (topics=10, topics=15, topics=1000....). I am usually doing just:

logfile = open('.../yourfile.txt', 'a')
print>>logfile, lda.show_topics(topics=-1, topn=10)

All these parameters and others are available in gensim documentation.

like image 33
Everst Avatar answered Sep 28 '22 07:09

Everst


You may use pickle module.

import pickle
# your code
pickle.dump(lda,open(filename,'w'))
# you may load it back again
lda_copy = pickle.load(file(filename))
like image 33
Nik Avatar answered Sep 28 '22 07:09

Nik