Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

list of word frequencies using R

Tags:

I have been using the tm package to run some text analysis. My problem is with creating a list with words and their frequencies associated with the same

library(tm) library(RWeka)  txt <- read.csv("HW.csv",header=T)  df <- do.call("rbind", lapply(txt, as.data.frame)) names(df) <- "text"  myCorpus <- Corpus(VectorSource(df$text)) myStopwords <- c(stopwords('english'),"originally", "posted") myCorpus <- tm_map(myCorpus, removeWords, myStopwords)  #building the TDM  btm <- function(x) NGramTokenizer(x, Weka_control(min = 3, max = 3)) myTdm <- TermDocumentMatrix(myCorpus, control = list(tokenize = btm)) 

I typically use the following code for generating list of words in a frequency range

frq1 <- findFreqTerms(myTdm, lowfreq=50) 

Is there any way to automate this such that we get a dataframe with all words and their frequency?

The other problem that i face is with converting the term document matrix into a data frame. As i am working on large samples of data, I run into memory errors. Is there a simple solution for this?

like image 742
ProcRJ Avatar asked Aug 07 '13 10:08

ProcRJ


People also ask

How do I count the frequency of a string in R?

You can use sapply() to go the counts and match every item in counts against the strings column in df using grepl() this will return a logical vector ( TRUE if match, FALSE if non-match). You can sum this vector up to get the number of matches.

How do you make a frequency table in R?

To create a frequency table in R, we can simply use table function but the output of table function returns a horizontal table. If we want to read the table in data frame format then we would need to read the table as a data frame using as. data. frame function.


2 Answers

Try this

data("crude") myTdm <- as.matrix(TermDocumentMatrix(crude)) FreqMat <- data.frame(ST = rownames(myTdm),                        Freq = rowSums(myTdm),                        row.names = NULL) head(FreqMat, 10) #            ST Freq # 1       "(it)    1 # 2     "demand    1 # 3  "expansion    1 # 4        "for    1 # 5     "growth    1 # 6         "if    1 # 7         "is    2 # 8        "may    1 # 9       "none    2 # 10      "opec    2 
like image 73
David Arenburg Avatar answered Sep 23 '22 08:09

David Arenburg


I have the following lines in R that can help to create word frequencies and put them in a table, it reads the file of text in .txt format and create the frequencies of words, I hope that this can help to anyone interested.

avisos<- scan("anuncio.txt", what="character", sep="\n") avisos1 <- tolower(avisos) avisos2 <- strsplit(avisos1, "\\W") avisos3 <- unlist(avisos2) freq<-table(avisos3) freq1<-sort(freq, decreasing=TRUE) temple.sorted.table<-paste(names(freq1), freq1, sep="\\t") cat("Word\tFREQ", temple.sorted.table, file="anuncio.txt", sep="\n") 
like image 28
alejandro Avatar answered Sep 22 '22 08:09

alejandro