How to access topic words only in gensim

Tags:

I built LDA model using Gensim and I want to get the topic words only How can I get the words of the topics only no probabilities and no IDs.words only

I tried print_topics() and show_topics() functions in gensim but I can't get clean words !

This is the code I used

dictionary = corpora.Dictionary(doc_clean)
doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(doc_term_matrix, num_topics=12, id2word = dictionary, passes = 100, alpha='auto', update_every=5)
x = ldamodel.print_topics(num_topics=12, num_words=5)
for i in x:
    print(i[1])
    #print('\n' + str(i))

0.045*تعرض + 0.045*الماضية + 0.045*السنوات + 0.045*وءسرته + 0.045*لءحمد
0.021*مصر + 0.021*الديمقراطية + 0.021*حرية + 0.021*باسم + 0.021*الحكومة
0.068*المواطنة + 0.068*الطاءفية + 0.068*وانهيارات + 0.068*رابطة + 0.005*طبول
0.033*عربية + 0.033*انكسارات + 0.033*رهابيين + 0.033*بحقوق + 0.033*ل
0.007*وحريات + 0.007*ممنهج + 0.007*قواءم + 0.007*الناس + 0.007*دراج
0.116*طبول + 0.116*الوطنية + 0.060*يكتب + 0.060*مصر + 0.005*عربية
0.064*قيم + 0.064*وهن + 0.064*عربيا + 0.064*والتعددية + 0.064*الديمقراطية
0.036*تضامنا + 0.036*الشخصية + 0.036*مع + 0.036*التفتيش + 0.036*الءخلاق
0.052*تضامنا + 0.052*كل + 0.052*محمد + 0.052*الخلوق + 0.052*مظلوم
0.034*بمواطنين + 0.034*رهابية + 0.034*لم + 0.034*عليهم + 0.034*يثبت
0.035*مع + 0.035*ومستشار + 0.035*يستعيدا + 0.035*ءرهقهما + 0.035*حريتهما
0.064*للقمع + 0.064*قريبة + 0.064*لا + 0.064*نهاية + 0.064*مصر

I tried show_topics and it gave the same output

y = np.array(ldamodel.show_topics(num_topics=12, num_words=5))
for i in y[:,1]:
    #if i != '%d':
    #print([str(word) for word in i])
    print(i)

If I have the topic ID how can I access its words and other informations

Thanks in Advance

745

asked Oct 03 '17 01:10

2 Answers

I think the below code snippet should give you a list of tuples containing the each topic(tp) and corresponding list of words(wd) in that topic

x=ldamodel.show_topics(num_topics=12, num_words=5,formatted=False)
topics_words = [(tp[0], [wd[0] for wd in tp[1]]) for tp in x]

#Below Code Prints Topics and Words
for topic,words in topics_words:
    print(str(topic)+ "::"+ str(words))
print()

#Below Code Prints Only Words 
for topic,words in topics_words:
    print(" ".join(words))

198

answered Sep 19 '22 09:09

The other answer was giving a string with weights associated with each word. But if you want to get each word in a topic separately for further work. Then you can try this. Here topic no is the key to the dictionary and the value is a single string containing all words in that topic separated by space

x=ldamodel.show_topics()

twords={}
for topic,word in x:
    twords[topic]=re.sub('[^A-Za-z ]+', '', word)
print(twords)

answered Sep 21 '22 09:09

Sreehari P.V

Related questions
                            
                                Python- Removing items
                            
                                Why matplotlib doesn't update in Anaconda to the 2.0 version
                            
                                Python how to use defaultdict fromkeys to generate a dictionary with predefined keys and empty lists
                            
                                How to play audio from outside static folder in Flask?
                            
                                Pytest with mock/pytest-mock
                            
                                Is there a way to specify a conditional type hint in Python?
                            
                                How to create a SECRET_HASH for AWS Cognito using boto3?
                            
                                How to convert a pandas dataframe into one dimensional array?
                            
                                Python tqdm and print weird printout order [duplicate]
                            
                                How to plot int to datetime on x axis using seaborn?
                            
                                Python: running pygame through Bash on Ubuntu on Windows
                            
                                Convert a column containing a list of dictionaries to multiple columns in pandas dataframe
                            
                                Rpy2: how to access the R list-type variable?
                            
                                How to select a subset of values from a named column level in a DataFrame?
                            
                                Async multiprocessing python
                            
                                What's preventing python from being compiled?
                            
                                How to intercept class creation and add attribute using a metaclass?
                            
                                How to run a `nix-shell` with a default.nix file?
                            
                                Is there a difference between 'await future' and 'await asyncio.wait_for(future, None)'?
                            
                                While loop blocks asyncio tasks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to access topic words only in gensim

Tags:

python

nlp

gensim

lda

topic-modeling

Muhammed Eltabakh

People also ask

2 Answers

oldmonk

Sreehari P.V

Recent Activity

Donate For Us