How to get document_topics distribution of all of the document in gensim LDA?

Tags:

I'm new to python and I need to construct a LDA project. After doing some preprocessing step, here is my code:

dictionary = Dictionary(docs)
corpus = [dictionary.doc2bow(doc) for doc in docs]

from gensim.models import LdaModel
num_topics = 10
chunksize = 2000
passes = 20
iterations = 400
eval_every = None
temp = dictionary[0]
id2word = dictionary.id2token
model = LdaModel(corpus=corpus, id2word=id2word, chunksize=chunksize, \
                       alpha='auto', eta='auto', \
                       random_state=42, \
                       iterations=iterations, num_topics=num_topics, \
                       passes=passes, eval_every=eval_every)

I want to get a topic distribution of docs, all of the document and get 10 probability of topic distribution, but when I use:

get_document_topics = model.get_document_topics(corpus)
print(get_document_topics)

The output only appear

<gensim.interfaces.TransformedCorpus object at 0x000001DF28708E10>

How do I get a topic distribution of docs?

562

asked Nov 15 '18 06:11

wayne64001

1 Answers

The function get_document_topics takes an input of a single document in BOW format. You're calling it on the full corpus (an array of documents) so it returns an iterable object with the scores for each document.

You have a few options. If you just want one document, run it on the document you want the values for:

get_document_topics = model.get_document_topics(corpus[0])

or do the following to get an array of scores for all the documents:

get_document_topics = [model.get_document_topics(item) for item in corpus]

Or directly access each object from your original code:

get_document_topics = model.get_document_topics(corpus)
print(get_document_topics[0])

192

answered Oct 28 '22 10:10

Andrew McDowell

Related questions
                            
                                List Comprehension for Strings
                            
                                Line doesn't show over barplot
                            
                                Prevent custom assert from showing in traceback of python unittest
                            
                                On openjdk:7-jre-alpine docker how to install python 3.6
                            
                                Kinesis Firehose lambda transformation
                            
                                Differences in three ways to define a abstract class
                            
                                Format Excel Column header for better visibility and Color
                            
                                How to find parent tags of an element with BeautifulSoup?
                            
                                How to use mock_open with json.load()?
                            
                                No module named '__main__.demo'; '__main__' is not a package python3
                            
                                I have to check if the string contains: alphanumeric, alphabetical , digits, lowercase and uppercase characters
                            
                                drops a column if it exceeds a specific number of NA values
                            
                                Why does tf.Print() not work?
                            
                                Is it possible to share a piece of code betwen AWS Lambda functions?
                            
                                How to break up lambda function in to its own function? (Lambda is currently 125+ characters)
                            
                                python3 fabric import Error: cannot import Connection
                            
                                normalization of categorical variable
                            
                                pandas apply changing dtype
                            
                                Simple method to extract specific color range from an image in Python?
                            
                                Python2 vs Python3: Different result when converting to datetime from timestamp

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get document_topics distribution of all of the document in gensim LDA?

Tags:

python-3.x

gensim

lda

topic-modeling

probability-distribution

wayne64001

People also ask

1 Answers

Andrew McDowell

Recent Activity

Donate For Us