Creating "word" cloud of phrases, not individual words in R

Tags:

I am trying to make a word cloud from a list of phrases, many of which are repeated, instead of from individual words. My data looks something like this, with one column of my data frame being a list of phrases.

df$names <- c("John", "John", "Joseph A", "Mary A", "Mary A", "Paul H C", "Paul H C")

I would like to make a word cloud where all of these names are treated as individual phrases whose frequency is displayed, not the words which make them up. The code I have been using looks like:

df.corpus <- Corpus(DataframeSource(data.frame(df$names)))
df.corpus <- tm_map(client.corpus, function(x) removeWords(x, stopwords("english")))
#turning that corpus into a tDM
tdm <- TermDocumentMatrix(df.corpus)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
pal <- brewer.pal(9, "BuGn")
pal <- pal[-(1:2)]
#making a worcloud
png("wordcloud.png", width=1280,height=800)
wordcloud(d$word,d$freq, scale=c(8,.3),min.freq=2,max.words=100, random.order=T, rot.per=.15, colors="black", vfont=c("sans serif","plain"))
dev.off()

This creates a word cloud, but it is of each component word, not of the phrases. So, I see the relative frequency of "A". "H", "John" etc instead of the relative frequency of "Joseph A", "Mary A", etc, which is what I want.

I'm sure this isn't that complicated to fix, but I can't figure it out! I would appreciate any help.

260

asked Nov 14 '14 20:11

verybadatthis

2 Answers

Your difficulty is that each element of df$names is being treated as "document" by the functions of tm. For example, the document John A contains the words John and A. It sounds like you want to keep the names as is, and just count up their occurrence - you can just use table for that.

library(wordcloud)
df<-data.frame(theNames=c("John", "John", "Joseph A", "Mary A", "Mary A", "Paul H C", "Paul H C"))
tb<-table(df$theNames)
wordcloud(names(tb),as.numeric(tb), scale=c(8,.3),min.freq=1,max.words=100, random.order=T, rot.per=.15, colors="black", vfont=c("sans serif","plain"))

enter image description here

188

answered Sep 29 '22 00:09

keegan

Install RWeka and its dependencies, then try this:

library(RWeka)
BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2))
# ... other tokenizers
tok <- BigramTokenizer
tdmgram <- TermDocumentMatrix(df.corpus, control = list(tokenize = tok))
#... create wordcloud

The tokenizer-line above chops your text into phrases of length 2.
More specifically, it creates phrases of minlength 2 and maxlength 2.
Using Weka's general NGramTokenizer Algorithm, You can create different tokenizers (e.g minlength 1, maxlength 2), and you'll probably want to experiment with different lengths. You can also call them tok1, tok2 instead of the verbose "BigramTokenizer" I've used above.

answered Sep 29 '22 00:09

knb

Related questions
                            
                                Using rollmean when there are missing values (NA)
                            
                                Index element from list in Rcpp
                            
                                save multiple plots in R as a .jpg file, how?
                            
                                Binary R heatmap still displays gradient
                            
                                Adding confidence intervals to a qq plot?
                            
                                Solving non-square linear system with R
                            
                                Count the number of Fridays or Mondays in Month in R
                            
                                Plot a line graph, error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ
                            
                                ggplot2 in shiny error: ggplot2 doesn't know how to deal with data of class packageIQR
                            
                                increase precision in Rcpp floating-point output
                            
                                Importing a text file into R
                            
                                Conditional rolling mean (moving average) on irregular time series
                            
                                How to get rid of whitespace in a ggplot2 plot?
                            
                                How to get only certain plots when plot() returns multiple plots
                            
                                Error when building R package using roxygen2
                            
                                order a dataframe by column in Rcpp
                            
                                How to concatenate numeric columns in R?
                            
                                passing data frame to mutate within function
                            
                                How to exit a sourced R script
                            
                                Scraping javascript website in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating "word" cloud of phrases, not individual words in R

Tags:

r

word-cloud

verybadatthis

People also ask

2 Answers

keegan

knb

Recent Activity

Donate For Us