Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R-invalid multibyte string 1

I'm new to R software

Now,studying text mining using "tm"package"

I have a ploblem on mapping text to lower case

sms_raw<-read.csv(............)
sms_corpus<-Corpus(VectorSource(sms_raw$text)) 
sms_corpus<-Corpus(VectorSource(sms_raw$text))  
tm_map(sms_corpus,content_transformer(tolower))   
error:invalid multubytes string 1

I thought my csv file could be not utf-8 so I restored as utf-8 but it didn't work.

my OS is win8.1

Anyone have solution on this problem please let me know.

like image 690
Dalgarim Avatar asked Nov 05 '14 07:11

Dalgarim


Video Answer


1 Answers

The error I had easily solved by encoding function

In my file's column which name is text contains multibyte character

So I type

sms_raw$text <- iconv(enc2utf8(sms_raw$text),sub="byte")

This command converts the 'text' column (multibyte) to utf8 form

like image 108
Dalgarim Avatar answered Oct 16 '22 13:10

Dalgarim