Is it possible to do clustering in gensim for a given set of inputs using LDA? How can I go about it?
Strictly speaking, Latent Dirichlet Allocation (LDA) is not a clustering algorithm. This is because clustering algorithms produce one grouping per item being clustered, whereas LDA produces a distribution of groupings over the items being clustered. Consider k-means, for instance, a popular clustering algorithm.
Latent Dirichlet Allocation (LDA) is an unsupervised clustering technique that is commonly used for text analysis. It's a type of topic modeling in which words are represented as topics, and documents are represented as a collection of these word topics.
The two main inputs to the LDA topic model are the dictionary and the corpus. For example, (0, 1) above implies, for the first document word id 0 (word: 'able') occurs once.
The Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm is an unsupervised learning algorithm that attempts to describe a set of observations as a mixture of distinct categories. LDA is most commonly used to discover a user-specified number of topics shared by documents within a text corpus.
LDA produces a lower dimensional representation of the documents in a corpus. To this low-d representation you could apply a clustering algorithm, e.g. k-means. Since each axis corresponds to a topic, a simpler approach would be assigning each document to the topic onto which its projection is largest.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With