Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to source() .R file saved using UTF-8 encoding?

The following, when copied and pasted directly into R works fine:

> character_test <- function() print("R同时也被称为GNU S是一个强烈的功能性语言和环境,探索统计数据集,使许多从自定义数据图形显示...") > character_test() [1] "R同时也被称为GNU S是一个强烈的功能性语言和环境,探索统计数据集,使许多从自定义数据图形显示..." 

However, if I make a file called character_test.R containing the EXACT SAME code, save it in UTF-8 encoding (so as to retain the special Chinese characters), then when I source() it in R, I get the following error:

> source(file="C:\\Users\\Tony\\Desktop\\character_test.R", encoding = "UTF-8") Error in source(file = "C:\\Users\\Tony\\Desktop\\character_test.R", encoding = "utf-8") :    C:\Users\Tony\Desktop\character_test.R:3:0: unexpected end of input 1: character.test <- function() print("R 2:    ^ In addition: Warning message: In source(file = "C:\\Users\\Tony\\Desktop\\character_test.R", encoding = "UTF-8") :   invalid input found on input connection 'C:\Users\Tony\Desktop\character_test.R' 

Any help you can offer in solving and helping me to understand what is going on here would be much appreciated.

> sessionInfo() # Windows 7 Pro x64 R version 2.12.1 (2010-12-16) Platform: x86_64-pc-mingw32/x64 (64-bit)  locale: [1] LC_COLLATE=English_United Kingdom.1252  [2] LC_CTYPE=English_United Kingdom.1252    [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C                            [5] LC_TIME=English_United Kingdom.1252      attached base packages: [1] stats     graphics  grDevices utils     datasets  methods   [7] base       loaded via a namespace (and not attached): [1] tools_2.12.1 

and

> l10n_info() $MBCS [1] FALSE  $`UTF-8` [1] FALSE  $`Latin-1` [1] TRUE  $codepage [1] 1252 
like image 204
Tony Breyal Avatar asked Feb 17 '11 16:02

Tony Breyal


People also ask

How do I save the source files with UTF-8 encoding?

You can make sure TextEdit saves files in Unicode (UTF-8) by going to TextEdit > Preferences… > Open and Save, and making sure the Save As setting is “Unicode (UTF-8)”.

What is source file encoding?

Scheme source code files are usually encoded in ASCII or UTF-8, but the built-in reader can interpret other character encodings as well. When Guile loads Scheme source code, it uses the file-encoding procedure (described below) to try to guess the encoding of the file. In the absence of any hints, UTF-8 is assumed.

How do I view a UTF-8 file?

Open the file in Notepad. Click 'Save As...'. In the 'Encoding:' combo box you will see the current file format. Yes, I opened the file in notepad and selected the UTF-8 format and saved it.

How do I save a file with encoding?

To save a file with encodingFrom the File menu, choose Save File As, and then click the drop-down button next to the Save button. The Advanced Save Options dialog box is displayed. Under Encoding, select the encoding to use for the file. Optionally, under Line endings, select the format for end-of-line characters.


1 Answers

On R/Windows, source runs into problems with any UTF-8 characters that can't be represented in the current locale (or ANSI Code Page in Windows-speak). And unfortunately Windows doesn't have UTF-8 available as an ANSI code page--Windows has a technical limitation that ANSI code pages can only be one- or two-byte-per-character encodings, not variable-byte encodings like UTF-8.

This doesn't seem to be a fundamental, unsolvable problem--there's just something wrong with the source function. You can get 90% of the way there by doing this instead:

eval(parse(filename, encoding="UTF-8")) 

This'll work almost exactly like source() with default arguments, but won't let you do echo=T, eval.print=T, etc.

like image 117
Joe Cheng Avatar answered Sep 28 '22 09:09

Joe Cheng