Latent Dirichlet Allocation Solution Example

Tags:

I am trying to learn about Latent Dirichlet Allocation (LDA). I have basic knowledge of machine learning and probability theory and based on this blog post http://goo.gl/ccPvE I was able to develop the intuition behind LDA. However I still haven't got complete understanding of the various calculations that goes in it. I am wondering can someone show me the calculations using a very small corpus (let say of 3-5 sentences and 2-3 topics).

774

asked May 16 '12 18:05

user737128

1 Answers

Edwin Chen (who works at Twitter btw) has an example in his blog. 5 sentences, 2 topics:

I like to eat broccoli and bananas.
I ate a banana and spinach smoothie for breakfast.
Chinchillas and kittens are cute.
My sister adopted a kitten yesterday.
Look at this cute hamster munching on a piece of broccoli.

Then he does some "calculations"

Sentences 1 and 2: 100% Topic A
Sentences 3 and 4: 100% Topic B
Sentence 5: 60% Topic A, 40% Topic B

And take guesses of the topics:

Topic A: 30% broccoli, 15% bananas, 10% breakfast, 10% munching, …
- at which point, you could interpret topic A to be about food
Topic B: 20% chinchillas, 20% kittens, 20% cute, 15% hamster, …
- at which point, you could interpret topic B to be about cute animals

Your question is how did he come up with those numbers? Which words in these sentences carry "information":

broccoli, bananas, smoothie, breakfast, munching, eat
chinchilla, kitten, cute, adopted, hampster

Now let's go sentence by sentence getting words from each topic:

food 3, cute 0 --> food
food 5, cute 0 --> food
food 0, cute 3 --> cute
food 0, cute 2 --> cute
food 2, cute 2 --> 50% food + 50% cute

So my numbers, differ slightly from Chen's. Maybe he includes the word "piece" in "piece of broccoli" as counting towards food.

We made two calculations in our heads:

to look at the sentences and come up with 2 topics in the first place. LDA does this by considering each sentence as a "mixture" of topics and guessing the parameters of each topic.
to decide which words are important. LDA uses "term-frequency/inverse-document-frequency" to understand this.

answered Sep 22 '22 16:09

john mangual

Related questions
                            
                                Understanding LDA in Spark
                            
                                pyldavis Unable to view the graph
                            
                                probabilities returned by gensim's get_document_topics method doesn't add up to one
                            
                                How to compute confusion matrix on Iris dataset?
                            
                                Side by side Wordclouds in matplotlib
                            
                                LDA interpretation
                            
                                Topic modeling on short texts Python
                            
                                Should I use a tfidf corpus or just corpus to inference documents using LDA?
                            
                                Visualizing topics with Spark LDA
                            
                                retrieve topic-word array & document-topic array from lda gensim
                            
                                How to abstract bigram topics instead of unigrams using Latent Dirichlet Allocation (LDA) in python- gensim?
                            
                                Run cvb in mahout 0.8
                            
                                ImportError: cannot import name corpora with Gensim
                            
                                Automatic labeling of LDA generated topics
                            
                                Understanding LDA Transformed Corpus in Gensim
                            
                                finding number of documents per topic for LDA with scikit-learn
                            
                                LDA TopicModels producing list of numbers rather than terms
                            
                                Use topic modeling information from LDA as features to perform text classification through SVM

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Latent Dirichlet Allocation Solution Example

Tags:

lda

topic-modeling

user737128

People also ask

1 Answers

john mangual

Recent Activity

Donate For Us