last parts of the code:
lda = LdaModel(corpus=corpus,id2word=dictionary, num_topics=2)
print lda
bash output:
INFO : adding document #0 to Dictionary(0 unique tokens)
INFO : built Dictionary(18 unique tokens) from 5 documents (total 20 corpus positions)
INFO : using serial LDA version on this node
INFO : running online LDA training, 2 topics, 1 passes over the supplied corpus of 5 documents, updating model once every 5 documents
WARNING : too few updates, training might not converge; consider increasing the number of passes to improve accuracy
INFO : PROGRESS: iteration 0, at document #5/5
INFO : 2/5 documents converged within 50 iterations
INFO : topic #0: 0.079*cute + 0.076*broccoli + 0.070*adopted + 0.069*yesterday + 0.069*eat + 0.069*sister + 0.068*kitten + 0.068*kittens + 0.067*bananas + 0.067*chinchillas
INFO : topic #1: 0.082*broccoli + 0.079*cute + 0.071*piece + 0.070*munching + 0.069*spinach + 0.068*hamster + 0.068*ate + 0.067*banana + 0.066*breakfast + 0.066*smoothie
INFO : topic diff=0.470477, rho=1.000000
<gensim.models.ldamodel.LdaModel object at 0x10f1f4050>
So I'm wondering i'm able to save the resulting topics that it generated, to a readable format. I've tried the .save()
methods, but it always outputs something unreadable.
A latent Dirichlet allocation (LDA) model is a topic model which discovers underlying topics in a collection of documents and infers word probabilities in topics. You can visualize the LDA topics using word clouds by displaying words with their corresponding topic word probabilities.
Latent Dirichlet Allocation(LDA) is a popular algorithm for topic modeling with excellent implementations in the Python's Gensim package. The challenge, however, is how to extract good quality of topics that are clear, segregated and meaningful.
chunksize - number of documents to consider at once (affects the memory consumption) update_every - update the model every update_every chunksize chunks (essentially, this is for memory consumption optimization) passes - how many times the algorithm is supposed to pass over the whole corpus.
Here is how to save a model for gensim LDA:
from gensim import corpora, models, similarities
# create corpus and dictionary
corpus = ...
dictionary = ...
# train model, this might takes time
model = models.LdaModel.LdaModel(corpus=corpus,id2word=dictionary, num_topics=200,passes=5, alpha='auto')
# save model to disk (no need to use pickle module)
model.save('lda.model')
To print topics, here are a few ways:
# later on, load trained model from file
model = models.LdaModel.load('lda.model')
# print all topics
model.show_topics(topics=200, topn=20)
# print topic 28
model.print_topic(109, topn=20)
# another way
for i in range(0, model.num_topics-1):
print model.print_topic(i)
# and another way, only prints top words
for t in range(0, model.num_topics-1):
print 'topic {}: '.format(t) + ', '.join([v[1] for v in model.show_topic(t, 20)])
you just need to use lda.show_topics(topics=-1)
or any number of topics you want to have (topics=10, topics=15, topics=1000....). I am usually doing just:
logfile = open('.../yourfile.txt', 'a')
print>>logfile, lda.show_topics(topics=-1, topn=10)
All these parameters and others are available in gensim documentation.
You may use pickle
module.
import pickle
# your code
pickle.dump(lda,open(filename,'w'))
# you may load it back again
lda_copy = pickle.load(file(filename))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With