Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do some Unicode characters display in matrices, but not data frames in R?

Tags:

r

For at least some cases, Asian characters are printable if they are contained in a matrix, or a vector, but not in a data.frame. Here is an example

q<-'天'  q # Works # [1] "天"   matrix(q) # Works #      [,1] # [1,] "天"  q2<-data.frame(q,stringsAsFactors=FALSE)  q2 # Does not work #          q # 1 <U+5929>  q2[1,] # Works again. # [1] "天" 

Clearly, my device is capable of displaying the character, but when it is in a data.frame, it does not work.

Doing some digging, I found that the print.data.frame function runs format on each column. It turns out that if you run format.default directly, the same problem occurs:

format(q) # "<U+5929>" 

Digging into format.default, I find that it is calling the internal format, written in C.

Before I dig any further, I want to know if others can reproduce this behaviour. Is there some configuration of R that would allow me to display these characters within data.frames?

My sessionInfo(), if it helps:

R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit)  locale: [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C                    [5] LC_TIME=English_Canada.1252      attached base packages: [1] stats     graphics  grDevices utils     datasets  methods   base       loaded via a namespace (and not attached): [1] tools_3.0.1 
like image 860
nograpes Avatar asked Jul 18 '13 06:07

nograpes


1 Answers

I hate to answer my own question, but although the comments and answers helped, they weren't quite right. In Windows, it doesn't seem like you can set a generic 'UTF-8' locale. You can, however, set country-specific locales, which will work in this case:

Sys.setlocale("LC_CTYPE", locale="Chinese") q2 # Works fine #  q #1 天 

But, it does make me wonder why exactly format seems to use the locale; I wonder if there is a way to have it ignore the locale in Windows. I also wonder if there is some generic UTF-8 locale that I don't know about on Windows.

like image 97
nograpes Avatar answered Sep 21 '22 14:09

nograpes