The relationship between latent Dirichlet allocation and documents clustering

Tags:

I would like to clarify the relationship between latent Dirichlet allocation (LDA) and the generic task of document clustering.

The LDA analysis tends to output the topic proportions for each document. If my understanding is correct, this is not the direct result of document clustering. However, we can treat this probability proportions as a feature reprsentation for each document. Afterwards, we can invoke other established clustering method based on the feature configurations generated by LDA analysis.

Is my understanding correct? Thanks.

624

asked Jul 07 '11 14:07

user785099

1 Answers

Yes, you can treat the output of LDA as features for your documents; this is exactly what Blei, Ng and Jordan did in the paper that introduced LDA. They did it for classification, but for clustering the procedure is the same.

(In machine learning terminology, this use of LDA is called dimensionality reduction because it reduces the feature space's number of dimensions from |V|, the vocabulary size, to some number k of topics selected by the user.)

answered Oct 05 '22 11:10

Fred Foo

Related questions
                            
                                NLP framework for .NET [closed]
                            
                                Best method to confirm an entity
                            
                                End user tool for generating a regular expression
                            
                                ARFF for natural language processing
                            
                                NLP software for classification of large datasets
                            
                                Causal Sentences Extraction Using NLTK python
                            
                                How to automatically label a cluster of words using semantics?
                            
                                how could I use complete penn treebank dataset inside python/nltk
                            
                                NLP of Legal Texts?
                            
                                Gensim: how to load precomputed word vectors from text file
                            
                                Natural Language Processing - Word Alignment
                            
                                How to get the wordnet sense frequency of a synset in NLTK?
                            
                                How does TfidfVectorizer compute scores on test data
                            
                                Naive bayes calculation in sql
                            
                                How do you find the subject of a sentence? [closed]
                            
                                Resolve coreference using Stanford CoreNLP - unable to load parser model
                            
                                doc2vec: How is PV-DBOW implemented
                            
                                How to treat numbers inside text strings when vectorizing words?
                            
                                Keras Multitask learning with two different input sample size
                            
                                Python: Tokenizing with phrases

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

The relationship between latent Dirichlet allocation and documents clustering

Tags:

machine-learning

nlp

text-mining

data-mining

lda

user785099

People also ask

1 Answers

Fred Foo

Recent Activity

Donate For Us