For at least some cases, Asian characters are printable if they are contained in a matrix
, or a vector
, but not in a data.frame
. Here is an example
q<-'天' q # Works # [1] "天" matrix(q) # Works # [,1] # [1,] "天" q2<-data.frame(q,stringsAsFactors=FALSE) q2 # Does not work # q # 1 <U+5929> q2[1,] # Works again. # [1] "天"
Clearly, my device is capable of displaying the character, but when it is in a data.frame
, it does not work.
Doing some digging, I found that the print.data.frame
function runs format
on each column. It turns out that if you run format.default
directly, the same problem occurs:
format(q) # "<U+5929>"
Digging into format.default
, I find that it is calling the internal format
, written in C.
Before I dig any further, I want to know if others can reproduce this behaviour. Is there some configuration of R that would allow me to display these characters within data.frame
s?
My sessionInfo()
, if it helps:
R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 [3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C [5] LC_TIME=English_Canada.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.1
I hate to answer my own question, but although the comments and answers helped, they weren't quite right. In Windows, it doesn't seem like you can set a generic 'UTF-8' locale. You can, however, set country-specific locales, which will work in this case:
Sys.setlocale("LC_CTYPE", locale="Chinese") q2 # Works fine # q #1 天
But, it does make me wonder why exactly format
seems to use the locale
; I wonder if there is a way to have it ignore the locale in Windows. I also wonder if there is some generic UTF-8
locale that I don't know about on Windows.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With