Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to display and input chinese (and other non-ASCII) character in r console?

Tags:

r

My system: win7 ultimate 64 english version + r-3.1(64) .
Here is my sessionInfo.

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252     
LC_MONETARY=English_United States.1252 LC_NUMERIC=C      
LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

1.can't input chinese character into r console
When I input a chinese character in r console, it turns to garbled character .

enter image description hereenter image description here

2.can't display chinese character on the r console
When I read data in r console, the chinese character turns into a garbled character .
You can download the data, and test it with

read.table("r1.csv",sep=",")

Download Data

enter image description here

Please see the graph to download the data if you don't know how to get the data from my web.

enter image description here

How can I setup my pc to properly display and input chinese characters in r console? I have updated the chinese language pack ,and enabled it,but problem remains still.

like image 305
showkey Avatar asked Jun 29 '14 06:06

showkey


1 Answers

It is probably not very well documented, but you want to use setlocale in order to use Chinese. And the method applies to many other languages as well. The solution is not obvious as the official document of setlocale didn't specifically mentioned it as a method to solve the display issues.

> print('ÊÔÊÔ') #试试, meaning let's give it a shot in Chinese
[1] "ÊÔÊÔ" #won't show up correctly
> Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
> Sys.setlocale(category = "LC_ALL", locale = "chs") #cht for traditional Chinese, etc.
[1] "LC_COLLATE=Chinese_People's Republic of China.936;LC_CTYPE=Chinese_People's Republic of China.936;LC_MONETARY=Chinese_People's Republic of China.936;LC_NUMERIC=C;LC_TIME=Chinese_People's Republic of China.936"
> print('试试')
[1] "试试"
> read.table("c:/CHS.txt",sep=" ") #Chinese: the 1st record/observation
  V1   V2  V3 V4  V5   V6
1 122 第一 122 条 122 记录 

If you just want to change the display encoding, without changing other aspects of locales, use LC_CTYPE instead of LC_ALL:

> Sys.setlocale(category = "LC_CTYPE", locale = "chs")
[1] "Chinese_People's Republic of China.936"
> print('试试')
[1] "试试"

Now, of course this only applies to the official R console. If you use other IDE's, such as the very popular RStudio, you don't need to do this at all to be able to type and display Chinese, even if you didn't have the Chinese locale loaded.

Migrate some useful stuff from the following comments:

If the data still fails to show up correctly, the we should also look into the issue of the file encoding. If the file is UTF-8 encoded, tither data <- read.table("you_file", sep=',', fileEncoding="UTF-8-BOM", header=TRUE) or fileEncoding="UTF-8" will do, depends on which encoding it really has.

But you may want to stay away from UTF-BOM as it is not recommended: What's different between UTF-8 and UTF-8 without BOM?

like image 73
CT Zhu Avatar answered Nov 03 '22 12:11

CT Zhu