Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a knitr option to force UTF-8 encoding in included R files?

I am using Windows 7, R2.15.3 and RStudio 0.97.320 with knitr knitr_1.1.6 (downloaded after Yihui fixed the 'Encoding: knitr and child files' issue on March 12)

> sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Spanish_Argentina.1252  LC_CTYPE=Spanish_Argentina.1252    LC_MONETARY=Spanish_Argentina.1252
[4] LC_NUMERIC=C                       LC_TIME=Spanish_Argentina.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lattice_0.20-13    pixmap_0.4-11      RColorBrewer_1.0-5 ade4_1.5-1         pander_0.3.1      
[6] xtable_1.7-1      

loaded via a namespace (and not attached):
[1] digest_0.6.3   evaluate_0.4.3 formatR_0.7    grid_2.15.3    knitr_1.1.6    stringr_0.6.2  tools_2.15.3 

I have my R code in a file like this one:

## @knitr RunMyCode 
print('Called from .R file: á é í ó ú ñ')  

# Workaround
my.text <- 'á é í ó ú ñ'
Encoding(my.text) <- "UTF-8"
print(my.text)

I call it from a Rmd file such as this:

Title
========================================================
Spanish text: á é í ó ú ñ

Use it from .Rmd code: it comes out right...
```{r}
print('á é í ó ú ñ')
```

```{r ReadFunctions, cache=FALSE, echo=TRUE, eval=TRUE}
read_chunk('TestSpanishText.R')
```

Spanish text comes out garbled here:
```{r RunMyCode, echo=TRUE, eval=TRUE, cache=TRUE, dependson=c('ReadFunctions')}
``` 

My problem is with spanish characters typed in the .R file (which is UTF-8 encoded in RStudio). These characters are OK if typed in Rmd files (both parent and child files work well), but not in R files. As you can see below, Encoding() does provide a workaround, but I wonder if there is another way, such as a global option? If I use Encoding(), I get the inverse problem in the RStudio console...

Title

Spanish text: á é í ó ú ñ

Use it from .Rmd code: it comes out right...

print("á é í ó ú ñ")
## [1] "á é í ó ú ñ"
read_chunk("TestSpanishText.R")

Spanish text comes out garbled here:    
print("Called from .R file: á é í ó ú ñ")
## [1] "Called from .R file: á é í ó ú ñ"

# Workaround
my.text <- "á é í ó ú ñ"
Encoding(my.text) <- "UTF-8"
print(my.text)
## [1] "á é í ó ú ñ"

Thank you!

like image 532
ap53 Avatar asked Mar 29 '13 12:03

ap53


People also ask

How do I change my encoding to UTF-8?

UTF-8 Encoding in Notepad (Windows)Click File in the top-left corner of your screen. In the dialog which appears, select the following options: In the "Save as type" drop-down, select All Files. In the "Encoding" drop-down, select UTF-8.

What is the default encoding for R?

edited the Rprofile file encoding line to 'options(encoding = "utf-8")' now R is using utf-8 as default encoding.

What is UTF encoding scheme?

UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”


1 Answers

Ideally I should have an encoding argument in read_chunk(), but since you were using UTF-8, this probably works:

read_chunk(lines = readLines("TestSpanishText.R", encoding = "UTF-8"))

Please try this first. If it does not work, I will add an encoding argument. Anyway, I'm sure this definitely works (it is just slightly longer):

con = file("TestSpanishText.R", encoding = "UTF-8")
read_chunk(con)
close(con)
like image 174
Yihui Xie Avatar answered Oct 01 '22 10:10

Yihui Xie