LDA TopicModels producing list of numbers rather than terms

Tags:

Bear with me as I am extremely new to this and working on a project for a course in a certificate program.

I have .csv dataset that I obtained by retrieving bibliometric records from Pubmed and Embase databases. There are 1034 rows. There are several columns, however, I am trying to create topic models from just one column, the Abstract column and some records do not have an abstract. I've done some processing (removing stopwords, punctuation, etc.) and have been able to barplot words occurring more than 200 times as well as create a Frequent Term list by rank and can also run word associations with selected words. So, it seems r is seeing the words themselves in the Abstract field. My issue comes when I try to create topic models using the topicmodels package. Here's the bit of code I'm using.

#including 1st 3 lines for reference
options(header = FALSE, stringsAsFactors = FALSE, FileEncoding = 
"latin1")
records <- read.csv("Combined.csv")
AbstractCorpus <- Corpus(VectorSource(records$Abstract))

AbstractTDM <- TermDocumentMatrix(AbstractCorpus)
library(topicmodels)
library(lda)
lda <- LDA(AbstractTDM, k = 8)
(term <- terms(lda, 6))
term <- (apply(term, MARGIN = 2, paste, collapse = ","))

However, the output of topics I get is the following.

Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Topic 7 Topic 8

[1,] "499"   "733"   "390"   "833"   "17"    "413"   "719"   "392"  
[2,] "484"   "655"   "808"   "412"   "550"   "881"   "721"   "61"   
[3,] "857"   "299"   "878"   "909"   "15"    "258"   "47"    "164"  
[4,] "491"   "672"   "313"   "1028"  "126"   "55"    "375"   "987"  
[5,] "734"   "430"   "405"   "102"   "13"    "193"   "83"    "588"  
[6,] "403"   "52"    "489"   "10"    "598"   "52"    "933"   "980"

Why am I not seeing words here rather than numbers?

Furthermore, the following code, which I basically took from the r PDF on topicmodels, does produce values for me, but the topics are still numbers rather than words, and this is meaningless to me.

#using information from topicmodels paper
library(tm)
library(topicmodels)
library(lda)
AbstractTM <- list(VEM = LDA(AbstractTDM, k = 10, control = list(seed =    
505)), VEM_fixed = LDA(AbstractTDM, k = 10, control = list(estimate.alpha 
= FALSE, seed = 505)), Gibbs = LDA(AbstractTDM, k = 10, method = "Gibbs", 
Control = list(seed = 505, burnin = 100, thin = 10, iter = 100)), CTM = 
CTM(AbstractTDM, k = 10, control = list(seed = 505, var = list(tol = 
10^-4), em = list(tol = 10^-3))))
#To compare the fitted models we first investigate the α values of the    
models fitted with VEM and α estimated and with VEM and α fixed 

sapply(AbstractTM[1:2], slot, "alpha")

#Find entropy 
sapply(AbstractTM, function(x)mean(apply(posterior(x)$topics, 1, 
function(z) - sum(z * log(z)))))

#Find estimated topics and terms
Topic <- topics(AbstractTM[["VEM"]], 1)
Topic
#find 5 most frequent terms for each topic
Terms <- terms(AbstractTM[["VEM"]], 5)
Terms[,1:5]

Any thoughts on what the issue might be?

989

asked Apr 17 '17 02:04

SciLibby

1 Answers

Reading the topicmodels documentation, it does appear that the LDA() function expects a DocumentTermMatrix, not a TermDocumentMatrix. Try creating the former with DocumentTermMatrix(AbstractCorpus) and see if that works.

165

answered Sep 22 '22 16:09

Kara Woo

Related questions
                            
                                Creating a data frame with the contents of multiple txt files
                            
                                R - allocate a share of a number over different columns using an ifelse statement
                            
                                Why does this image made by R have a spurious vertical white line in it?
                            
                                Why does my NLOPT optimization error/fail to solve?
                            
                                ggplot2: Deleting facets of unused factor level combinations from a plot (facet_grid)
                            
                                Bookdown: Single html output file
                            
                                How to cast data from long to wide format in H2O?
                            
                                Match by id and divide column values across two dataframes
                            
                                R Change IP Address programmatically
                            
                                Join gap in polar line ggplot plot
                            
                                Merging 2 vectors and removing all repetitions
                            
                                double nesting with tidyverse and purrr
                            
                                How can I replace vector values in a sequence at regular intervals in R?
                            
                                Correlationmatrix from data table
                            
                                How to capture system2() output in R
                            
                                What is the **tidyverse** method for splitting a df by multiple columns?
                            
                                Drawing an "arrow" in the legend of an R plot
                            
                                Correct use of pivot in Cholesky decomposition of positive semi-definite matrix
                            
                                Building plotly graph in for loop not displaying all series
                            
                                Change comma and thousand separator in tick labels

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

LDA TopicModels producing list of numbers rather than terms

Tags:

r

lda

topicmodels

SciLibby

People also ask

1 Answers

Kara Woo

Recent Activity

Donate For Us