I am using 'lda' package in R for topic modeling. I want to predict new topics(collection of related words in a document) using a fitted Latent Dirichlet Allocation(LDA) model for new dataset. In the process, I came across predictive.distribution() function. But the function takes document_sums as input parameter which is an output of the result after fitting the new model. I need help to understand the use of existing model on new dataset and predict topics. Here is the example code present in the documentation written by Johnathan Chang for the package: Here is the code for it:
#Fit a model
data(cora.documents)
data(cora.vocab)
K <- 10 ## Num clusters
result <- lda.collapsed.gibbs.sampler(cora.documents,K, cora.vocab,25, 0.1, 0.1)
# Predict new words for the first two documents
predictions <- predictive.distribution(result$document_sums[,1:2], result$topics, 0.1, 0.1)
# Use top.topic.words to show the top 5 predictions in each document.
top.topic.words(t(predictions), 5)
Any help will be appreciated
Thanks & Regards,
Ankit
I don't know how you can achieve this in R but please have a look at a 2009 publication by Wallach et. al. titled 'Evaluation Methods for Topic Models' here. Have a look at section 4, it mentions three methods to calculate P(z|w), one based on importance sampling and other two called 'Chib-style estimator' and 'left-to-right estimator'.
Mallet has implementation of left-to-right estimator method
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With