How to monitor convergence of Gensim LDA model?

Tags:

I can't seem to find it or probably my knowledge on statistics and its terms are the problem here but I want to achieve something similar to the graph found on the bottom page of the LDA lib from PyPI and observe the uniformity/convergence of the lines. How can I achieve this with Gensim LDA?

859

asked Jun 01 '16 13:06

ZeferiniX

1 Answers

You are right to wish to plot the convergence of your model fitting. Gensim unfortunately does not seem to make this very straight forward.

Run the model in such a way that you will be able to analyze the output of the model fitting function. I like to setup a log file.

import logging
logging.basicConfig(filename='gensim.log',
                    format="%(asctime)s:%(levelname)s:%(message)s",
                    level=logging.INFO)

Set the eval_every parameter in LdaModel. The lower this value is the better resolution your plot will have. However, computing the perplexity can slow down your fit a lot!
```
lda_model = 
LdaModel(corpus=corpus,
         id2word=id2word,
         num_topics=30,
         eval_every=10,
         pass=40,
         iterations=5000)
```

Parse the log file and make your plot.

import re
import matplotlib.pyplot as plt
p = re.compile("(-*\d+\.\d+) per-word .* (\d+\.\d+) perplexity")
matches = [p.findall(l) for l in open('gensim.log')]
matches = [m for m in matches if len(m) > 0]
tuples = [t[0] for t in matches]
perplexity = [float(t[1]) for t in tuples]
liklihood = [float(t[0]) for t in tuples]
iter = list(range(0,len(tuples)*10,10))
plt.plot(iter,liklihood,c="black")
plt.ylabel("log liklihood")
plt.xlabel("iteration")
plt.title("Topic Model Convergence")
plt.grid()
plt.savefig("convergence_liklihood.pdf")
plt.close()

141

answered Sep 22 '22 19:09

groceryheist

Related questions
                            
                                How to set up Python server side with javascript client side
                            
                                How to define a mutually exclusive group of two positional arguments?
                            
                                Resample a time series with the index of another time series
                            
                                Plot a 3D surface from {x,y,z}-scatter data in python
                            
                                SqlAlchemy select with max, group_by and order_by
                            
                                override class variable in python?
                            
                                Is a generator the callable? Which is the generator?
                            
                                Concatenate custom features with CountVectorizer
                            
                                Pandas Groupby apply function to count values greater than zero
                            
                                What is the form of my local postgresql database url?
                            
                                Python, how to implement something like .gitignore behavior
                            
                                Provide extra information to Flask's app.logger
                            
                                How to replace all value in all columns in a Pandas dataframe with condition
                            
                                Debounce Celery tasks?
                            
                                Pass a custom queryset to serializer in Django Rest Framework
                            
                                How to specify large integer literals in a readable way?
                            
                                Classifying Python array by nearest "seed" region?
                            
                                TensorFlow MNIST example not running with fully_connected_feed.py
                            
                                Pandas error "Can only use .str accessor with string values"
                            
                                How to get numpy array of RGB colors from pygame.surface

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to monitor convergence of Gensim LDA model?

Tags:

python

gensim

lda

convergence

ZeferiniX

People also ask

1 Answers

groceryheist

Recent Activity

Donate For Us