Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error in .jcall()

Tags:

r

I am running the following code and receiving this error:

Error in .jcall("RWekaInterfaces", "[S", "tokenize", .jcast(tokenizer, : java.lang.NullPointerException

setwd("C:\\Users\\jbarr\\Desktop\\test)
library (tm); library (wordcloud);library (RWeka); library (tau);library(xlsx);

Comment <- read.csv("testfile.csv",stringsAsFactors=FALSE) 
str(Comment) 
review_source <- VectorSource(Comment) 

corpus <- Corpus(review_source)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, stripWhitespace)
corpus <- tm_map(corpus, removeWords,stopwords(kind = "english"))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removeWords, c("member", "advise", "inform", "informed", "caller", "call","provided", "advised")) 


dtm <- DocumentTermMatrix(corpus)
dtm2 <- as.matrix(dtm)
wordfreq <- colSums(dtm2)
wordfreq <- sort(wordfreq, decreasing=TRUE)
head(wordfreq, n=100)
wfreq <- head(wordfreq, 500)
set.seed(142)
words <- names(wfreq)
dark2 <- brewer.pal(6, "Dark2")
wordcloud(words[1:100], wordfreq[1:100], rot.per=0.35, scale=c(2.7, .4), colors=dark2, random.order=FALSE)
write.xlsx(wfreq, "C:\\Users\\jbarr\\Desktop\\test")

The interesting problem is, I have ran this code on multiple files, and only specific ones have the error.

like image 233
JBARR Avatar asked Nov 27 '25 21:11

JBARR


2 Answers

Sanmeet is right - it's a problem with NAs in your data frame.

just prior to your line: review_source <- VectorSource(Comment)

insert the line below:

Comment[which(is.na(Comment))] <- "NULLVALUEENTERED"

This will change all of your NA values to the phrase NULLVALUEENTERED (feel free to change that). No more NAs, and the code should run fine.

like image 153
Chelsie Avatar answered Nov 29 '25 10:11

Chelsie


You are getting the error in tokenizer due to NA in your string vector Comment

Comment <- read.csv("testfile.csv",stringsAsFactors=FALSE) 
str(Comment)     
length(Comment)
Comment = Comment[complete.cases(Comment)]
length(Comment)

Or you can also use is.na as below

Comment = Comment[!is.na(Comment)]

Now apply the preprocessing steps, create the corpus etc

Hope this helps.

like image 39
Shobha Mourya Avatar answered Nov 29 '25 10:11

Shobha Mourya



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!