I have been using the tm package to run some text analysis. My problem is with creating a list with words and their frequencies associated with the same <pre class="prettyprint"><code>library(tm) library(RWeka) txt <- read.csv("HW.csv",header=T) df <- do.call("rbind", lapply(txt, as.data.frame)) names(df) <- "text" myCorpus <- Corpus(VectorSource(df$text)) myStopwords <- c(stopwords('english'),"originally", "posted") myCorpus <- tm_map(myCorpus, removeWords, myStopwords) #building the TDM btm <- function(x) NGramTokenizer(x, Weka_control(min = 3, max = 3)) myTdm <- TermDocumentMatrix(myCorpus, control = list(tokenize = btm)) </code></pre> I typically use the following code for generating list of words in a frequency range <pre class="prettyprint"><code>frq1 <- findFreqTerms(myTdm, lowfreq=50) </code></pre> Is there any way to automate this such that we get a dataframe with all words and their frequency? The other problem that i face is with converting the term document matrix into a data frame. As i am working on large samples of data, I run into memory errors. Is there a simple solution for this?

I have the following lines in R that can help to create word frequencies and put them in a table, it reads the file of text in .txt format and create the frequencies of words, I hope that this can help to anyone interested. <pre class="prettyprint"><code>avisos<- scan("anuncio.txt", what="character", sep="\n") avisos1 <- tolower(avisos) avisos2 <- strsplit(avisos1, "\\W") avisos3 <- unlist(avisos2) freq<-table(avisos3) freq1<-sort(freq, decreasing=TRUE) temple.sorted.table<-paste(names(freq1), freq1, sep="\\t") cat("Word\tFREQ", temple.sorted.table, file="anuncio.txt", sep="\n") </code></pre>

list of word frequencies using R

Tags:

I have been using the tm package to run some text analysis. My problem is with creating a list with words and their frequencies associated with the same

library(tm) library(RWeka)  txt <- read.csv("HW.csv",header=T)  df <- do.call("rbind", lapply(txt, as.data.frame)) names(df) <- "text"  myCorpus <- Corpus(VectorSource(df$text)) myStopwords <- c(stopwords('english'),"originally", "posted") myCorpus <- tm_map(myCorpus, removeWords, myStopwords)  #building the TDM  btm <- function(x) NGramTokenizer(x, Weka_control(min = 3, max = 3)) myTdm <- TermDocumentMatrix(myCorpus, control = list(tokenize = btm))

I typically use the following code for generating list of words in a frequency range

frq1 <- findFreqTerms(myTdm, lowfreq=50)

Is there any way to automate this such that we get a dataframe with all words and their frequency?

The other problem that i face is with converting the term document matrix into a data frame. As i am working on large samples of data, I run into memory errors. Is there a simple solution for this?

742

asked Aug 07 '13 10:08

ProcRJ

2 Answers

Try this

data("crude") myTdm <- as.matrix(TermDocumentMatrix(crude)) FreqMat <- data.frame(ST = rownames(myTdm),                        Freq = rowSums(myTdm),                        row.names = NULL) head(FreqMat, 10) #            ST Freq # 1       "(it)    1 # 2     "demand    1 # 3  "expansion    1 # 4        "for    1 # 5     "growth    1 # 6         "if    1 # 7         "is    2 # 8        "may    1 # 9       "none    2 # 10      "opec    2

answered Sep 23 '22 08:09

David Arenburg

I have the following lines in R that can help to create word frequencies and put them in a table, it reads the file of text in .txt format and create the frequencies of words, I hope that this can help to anyone interested.

avisos<- scan("anuncio.txt", what="character", sep="\n") avisos1 <- tolower(avisos) avisos2 <- strsplit(avisos1, "\\W") avisos3 <- unlist(avisos2) freq<-table(avisos3) freq1<-sort(freq, decreasing=TRUE) temple.sorted.table<-paste(names(freq1), freq1, sep="\\t") cat("Word\tFREQ", temple.sorted.table, file="anuncio.txt", sep="\n")

answered Sep 22 '22 08:09

alejandro

Related questions
                            
                                Can SignalR be used with asp.net WebForms?
                            
                                What is the difference between ServerName and ServerAlias in apache2 configuration?
                            
                                Laravel or where
                            
                                Convert normal recursion to tail recursion
                            
                                D3 Mouse Events -- Click & DragEnd
                            
                                What does 'const&' mean in C++? [closed]
                            
                                Dialog on Android KitKat seems to be cut
                            
                                How can I see what my reactive extensions query is doing?
                            
                                Get PATCH request data in PHP
                            
                                Sqlite3: how to reorder columns in a table?
                            
                                Remove week column and button from Angular-ui bootstrap datepicker
                            
                                Java fuzzy String matching with names

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With