dtm <- DocumentTermMatrix(docs, control = params)
Error in nchar(rownames(m)) : invalid multibyte string, element 1
Anyone who knows how to tackle this error? Working in Rstudio
Sys.setlocale( 'LC_ALL','C' )
In R studio apply this code .. It will refresh the locale .. worked for me many times.
This happens when your input text isn't UTF-8 encoded. You can read about character encoding here.
Another good reference is this
I've found that the best way to handle these issues is to use stringr::str_conv.
mydocs <- c("doc1", "doc2", "doc3")
stringr::str_conv(mydocs, "UTF-8")
Where you have non-UTF-8 characters, you'll get a warning, but the character vector that comes out the other side will be usable.
Do that to your docs vector before calling `DocumentTermMatrix.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With