Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot a document topic distribution in structural topic modeling R-package?

If I am using python Sklearn for LDA topic modeling, I can use the transform function to get a "document topic distribution" of the LDA-results like here:

document_topic_distribution = lda_model.transform(document_term_matrix)

Now I tried also the R structural topic models (stm) package and i want get the same. Is there any function in the stm package, which can produce the same thing (document topic distribution)? I have the stm-object created as follows:

stm_model <- stm(documents = out$documents, vocab = out$vocab,
                       K = number_of_topics, data = out$meta, 
                       max.em.its = 75, init.type = "Spectral" )

But i didn't find out how I can get the desired distribution out of this object. The documentation didn't really help me aswell.

like image 779
rakael Avatar asked Feb 06 '26 00:02

rakael


1 Answers

As emilliman5 pointed out, your stm_model provides access to the underlying parameters of the model, as is shown in the documentation.

Indeed, the theta parameter is a

Number of Documents by Number of Topics matrix of topic proportions.

This requires some linguistical parsing: it is an N_DOCS by N_TOPICS matrix, i.e. it has N_DOCS rows, one per document, and N_TOPICS columns, one per topic. The values are the topic proportions, i.e. if stm_model[1, ] == c(.3, .2, .5), that means Document 1 is 30% Topic 1, 20% Topic 2 and 50% Topic 3.

To find out what topic dominates a document, you have to find the (column!) index of the maximum value, which can be retrieved e.g. by calling apply with MARGIN=1, which basically says "do this row-wise"; which.max simply returns the index of the maximum value:

apply(stm_model$theta, MARGIN=1, FUN=which.max)
like image 137
Oliver Baumann Avatar answered Feb 08 '26 12:02

Oliver Baumann