Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TermDocumentMatrix sometimes throwing error

I am creating a Word Cloud based on Tweets from various different sports teams. This code executes successfully about 1 in 10 times:

handle <- 'arsenal'
txt <- searchTwitter(handle,n=1000,lang='en')
t <- sapply(txt,function(x) x$getText())
t <- gsub('http.*\\s*|RT|Retweet','',t)
t <- gsub(handle,'',t)
t_c <- Corpus(VectorSource(t))
tdm = TermDocumentMatrix(t_c,control = list(removePunctuation = TRUE,stopwords = stopwords("english"),removeNumbers = TRUE, content_transformer(tolower)))
m = as.matrix(tdm)
word_freqs = sort(rowSums(m), decreasing=TRUE) 
dm = data.frame(word=names(word_freqs), freq=word_freqs)
wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(8, "Dark2"),rot.per=0.5)

The other 9 out of 10 times, it throws the following error:

Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms),  : 
  'i, j, v' different lengths
In addition: Warning messages:
1: In mclapply(unname(content(x)), termFreq, control) :
  all scheduled cores encountered errors in user code
2: In simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms),  :
  NAs introduced by coercion

Any ideas guys? I've googled, but so far have come up short! Keep in mind I'm an absolute newbie in R!

like image 675
Dan Avatar asked Sep 06 '14 10:09

Dan


2 Answers

So after a bit of playing around, the following line of code has completely fixed my issue:

t <- iconv(t,to="utf-8-mac")
like image 63
Dan Avatar answered Nov 20 '22 16:11

Dan


I suppose you have used the following line of code somewhere before using DocumentTermMatrix command.

corpus = tm_map(corpus, PlainTextDocument)

This line of code converts all text in the corpus to PlainTextDocument, on which the DocumentTermMatrix function does not work properly.

Just repeat entire process of creating the corpus and preprocessing it skipping the above command and you will be good to go.

like image 37
ShivamS Avatar answered Nov 20 '22 18:11

ShivamS