Should I use a tfidf corpus or just corpus to inference documents using LDA?

Question

I am wondering whether it's either a TFIDF corpus to be used or just corpus to be used when we are inference documents using LDA in gensim

Here is an example

from gensim import corpora, models
import numpy.random
numpy.random.seed(10)

doc0 = [(0, 1), (1, 1)]
doc1 = [(0,1)] 
doc2 = [(0, 1), (1, 1)]
doc3 = [(0, 3), (1, 1)]

corpus = [doc0,doc1,doc2,doc3]
dictionary = corpora.Dictionary(corpus)

tfidf = models.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
corpus_tfidf.save('x.corpus_tfidf')

corpus_tfidf = corpora.MmCorpus.load('x.corpus_tfidf')

lda = models.ldamodel.LdaModel(corpus_tfidf, id2word=dictionary, num_topics=2)

#which one i should use from this   
**corpus_lda = lda[corpus]**          #this one 
**corpus_LDA = lda[corpus_tfidf ]**   #or this one?


corpus_lda.save('x.corpus_lda')

for i,j in enumerate(corpus_lda):
    print j, corpus[i]

MrFancypants · Accepted Answer

According to Gensim's mailing list (last post in particular) the standard procedure would be to use a bag of words corpus. You can use a TF-IDF corpus, but it seems to be unclear what kind of effect this would have.

Should I use a tfidf corpus or just corpus to inference documents using LDA?

Tags:

python

gensim

lda

Nipun Alahakoon

1 Answers

MrFancypants

Recent Activity

Donate For Us

Should I use a tfidf corpus or just corpus to inference documents using LDA?

Tags:

python

gensim

lda

Nipun Alahakoon

1 Answers

MrFancypants

Related questions

Recent Activity

Donate For Us