Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to solve error **Error in nchar(rownames(m)) : invalid multibyte string, element 1**?

Tags:

r

lda

Create document-term matrix

dtm <- DocumentTermMatrix(docs, control = params)

Error in nchar(rownames(m)) : invalid multibyte string, element 1

Anyone who knows how to tackle this error? Working in Rstudio

like image 416
Senne Meneghini Avatar asked Nov 25 '25 05:11

Senne Meneghini


2 Answers

Sys.setlocale( 'LC_ALL','C' ) 

In R studio apply this code .. It will refresh the locale .. worked for me many times.

like image 74
Sahil Sharma Avatar answered Nov 27 '25 21:11

Sahil Sharma


This happens when your input text isn't UTF-8 encoded. You can read about character encoding here.

Another good reference is this

I've found that the best way to handle these issues is to use stringr::str_conv.

mydocs <- c("doc1", "doc2", "doc3")

stringr::str_conv(mydocs, "UTF-8")

Where you have non-UTF-8 characters, you'll get a warning, but the character vector that comes out the other side will be usable.

Do that to your docs vector before calling `DocumentTermMatrix.

like image 45
Tommy Jones Avatar answered Nov 27 '25 21:11

Tommy Jones



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!