Hierarchical Dirichlet Process Gensim topic number independent of corpus size

Tags:

I am using the Gensim HDP module on a set of documents.

>>> hdp = models.HdpModel(corpusB, id2word=dictionaryB)
>>> topics = hdp.print_topics(topics=-1, topn=20)
>>> len(topics)
150
>>> hdp = models.HdpModel(corpusA, id2word=dictionaryA)
>>> topics = hdp.print_topics(topics=-1, topn=20)
>>> len(topics)
150
>>> len(corpusA)
1113
>>> len(corpusB)
17

Why is the number of topics independent of corpus length?

950

asked Jul 21 '15 15:07

user0

1 Answers

@Aaron's code above is broken due to gensim API changes. I rewrote and simplified it as follows. Works as of June 2017 with gensim v2.1.0

import pandas as pd

def topic_prob_extractor(gensim_hdp):
    shown_topics = gensim_hdp.show_topics(num_topics=-1, formatted=False)
    topics_nos = [x[0] for x in shown_topics ]
    weights = [ sum([item[1] for item in shown_topics[topicN][1]]) for topicN in topics_nos ]

    return pd.DataFrame({'topic_id' : topics_nos, 'weight' : weights})

134

answered Oct 03 '22 06:10

Roko Mijic

Related questions
                            
                                Rounding in jinja2 brackets
                            
                                How can I implement multiple URL parameters in a Tornado route?
                            
                                Celery + Django: Cannot start celerybeat on Windows 7
                            
                                Populate numpy matrix from the difference of two vectors
                            
                                Python: import cx_Oracle ImportError: No module named cx_Oracle error is thown
                            
                                Removing trailing empty elements in a list
                            
                                HTTP POST and GET with cookies for authentication in python
                            
                                Remove dtype at the end of numpy array
                            
                                Integer literal is an object in Python? [duplicate]
                            
                                Combining scatter plot with surface plot
                            
                                standard way to handle user session in tornado
                            
                                Numerical ODE solving in Python
                            
                                Python Check if a Minute has passed
                            
                                making an array of sets in python
                            
                                Python Decimal - engineering notation for mili (10e-3) and micro (10e-6)
                            
                                Get files from Directory Argument, Sorting by Size
                            
                                Can an uploaded image be loaded directly by cv2?
                            
                                How to disable loggers from other modules?
                            
                                filter with more than one value on flask-sqlalchemy [duplicate]
                            
                                inpolygon for Python - Examples of matplotlib.path.Path contains_points() method?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hierarchical Dirichlet Process Gensim topic number independent of corpus size

Tags:

python

nlp

gensim

lda

user0

People also ask

1 Answers

Roko Mijic

Recent Activity

Donate For Us