Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spaces in wordcloud

Tags:

r

word-cloud

I currently use wordle for many artsy uses of the word cloud. I think that R's word cloud, potentially, has better control.

1) How do you keep a word capitalized in the word cloud? [SOLVED]

2) How do keep two words as one chunk in the wordcloud? (wordle uses the ~ operator to accomplish this, R's word cloud merely prints the ~ as is) [For instance where there's a ~ between "to" and "be" I'd like a space in the word cloud]

require(wordcloud)

y<-c("the", "the", "the", "tree", "tree", "tree", "tree", "tree", 
"tree", "tree", "tree", "tree", "tree", "Wants", "Wants", "Wants", 
"Wants", "Wants", "Wants", "Wants", "Wants", "Wants", "Wants", 
"Wants", "Wants", "to~be", "to~be", "to~be", "to~be", "to~be", 
"to~be", "to~be", "to~be", "to~be", "to~be", "to~be", "to~be", 
"to~be", "to~be", "to~be", "to~be", "to~be", "to~be", "to~be", 
"to~be", "when", "when", "when", "when", "when", "familiar", "familiar", 
"familiar", "familiar", "familiar", "familiar", "familiar", "familiar", 
"familiar", "familiar", "familiar", "familiar", "familiar", "familiar", 
"familiar", "familiar", "familiar", "familiar", "familiar", "familiar", 
"leggings", "leggings", "leggings", "leggings", "leggings", "leggings", 
"leggings", "leggings", "leggings", "leggings")

wordcloud(names(table(y)), table(y))
like image 551
Tyler Rinker Avatar asked Oct 09 '22 12:10

Tyler Rinker


1 Answers

You asked two questions:

  1. You can control the capitalisation (or not) by specifying a control argument to TermDocumentMatrix
  2. No doubt there is an argument somewhere to control the ~, but here is an easy workaround: Use gsub to change ~ to white space in the step just before plotting.

Some code:

corpus <- Corpus(VectorSource(y))
tdm <- TermDocumentMatrix(corpus, control=list(tolower=FALSE)) ## Edit 1

m <- as.matrix(tdm)
v <- sort(rowSums(m), decreasing = TRUE)
d <- data.frame(word = names(v), freq = v)
d$word <- gsub("~", " ", d$word) ## Edit 2

wordcloud(d$word, d$freq)

enter image description here

like image 62
Andrie Avatar answered Oct 18 '22 00:10

Andrie