Predicting LDA topics for new data

Tags:

It looks like this question has may have been asked a few times before (here and here), but it has yet to be answered. I'm hoping this is due to the previous ambiguity of the question(s) asked, as indicated by comments. I apologize if I am breaking protocol by asking a simliar question again, I just assumed that those questions would not be seeing any new answers.

Anyway, I am new to Latent Dirichlet Allocation and am exploring its use as a means of dimension reduction for textual data. Ultimately I would like extract a smaller set of topics from a very large bag of words and build a classification model using those topics as a few variables in the model. I've had success in running LDA on a training set, but the problem I am having is being able to predict which of those same topics appear in some other test set of data. I am using R's topicmodels package right now, but if there is another way to this using some other package I am open to that as well.

Here is an example of what I am trying to do:

library(topicmodels) data(AssociatedPress)  train <- AssociatedPress[1:100] test <- AssociatedPress[101:150]  train.lda <- LDA(train,5) topics(train.lda)  #how can I predict the most likely topic(s) from "train.lda" for each document in "test"?

379

asked Apr 20 '13 00:04

David

1 Answers

With the help of Ben's superior document reading skills, I believe this is possible using the posterior() function.

library(topicmodels) data(AssociatedPress)  train <- AssociatedPress[1:100] test <- AssociatedPress[101:150]  train.lda <- LDA(train,5) (train.topics <- topics(train.lda)) #  [1] 4 5 5 1 2 3 1 2 1 2 1 3 2 3 3 2 2 5 3 4 5 3 1 2 3 1 4 4 2 5 3 2 4 5 1 5 4 3 1 3 4 3 2 1 4 2 4 3 1 2 4 3 1 1 4 4 5 # [58] 3 5 3 3 5 3 2 3 4 4 3 4 5 1 2 3 4 3 5 5 3 1 2 5 5 3 1 4 2 3 1 3 2 5 4 5 5 1 1 1 4 4 3  test.topics <- posterior(train.lda,test) (test.topics <- apply(test.topics$topics, 1, which.max)) #  [1] 3 5 5 5 2 4 5 4 2 2 3 1 3 3 2 4 3 1 5 3 5 3 1 2 2 3 4 1 2 2 4 4 3 3 5 5 5 2 2 5 2 3 2 3 3 5 5 1 2 2

165

answered Dec 14 '22 08:12

David

Related questions
                            
                                Collapsible Panel in HTML/CSS [closed]
                            
                                Does Python's time.time() return a timestamp in UTC? [duplicate]
                            
                                How to get height=100% on <object> Embed
                            
                                What is the difference between withCriteria and CreateCriteria in Grails?
                            
                                linux bash - remove all files which are in one directory from another directory [closed]
                            
                                Error no such file or directory 'bower_components'
                            
                                where the target all-recursive is given in makefile
                            
                                Why does this function cast an int argument to a volatile pointer and immediately dereferences it?
                            
                                Processing a C# Dictionary using LINQ
                            
                                Get objects with date greater than today or empty date
                            
                                How to use bash return code in conditional?
                            
                                get_queryset method and ViewSets in django rest framework

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Predicting LDA topics for new data

Tags:

David

People also ask

1 Answers

David

Recent Activity

Donate For Us