Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding: knitr and child files

I am using Windows 7, R2.15.3 and RStudio 0.97.320 with knitr 1.1. Not sure what my pandoc version is, but I downloaded it a couple of days ago.

sessionInfo()
R version 2.15.3 (2013-03-01) Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Spanish_Argentina.1252  LC_CTYPE=Spanish_Argentina.1252    LC_MONETARY=Spanish_Argentina.1252
[4] LC_NUMERIC=C                       LC_TIME=Spanish_Argentina.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_2.15.3  

I would like to get my reports both in html and Word, so I'm using markdown and pandoc. I write in spanish with accents on vowels and tildes on the n: á-ú and ñ.

I have read many posts and I see problems similar to the one I'm having have been solved with new versions of knitr. But there is one issue I haven't found a solution for.

When I started, I used the 'system default' encoding that appears in the RStudio dialog, i.e. ISO 8859-1, and the RStudio previews worked great. However when I tried to get Word documents, pandoc choked on the accentuated vowels. I found a post showing how to solve this using iconv:

iconv -t utf-8 "myfile.md" | pandoc -o "myfile.docx"| iconv -f utf-8

While this did solve pandoc's unrecognized utf-8 characters complaints, for some reason pandoc stops finding my plots, with an error like this one:

pandoc: Could not find image `figure/Parent.png', skipping...

If I use only non-accent characters, pandoc finds the images with no problems. I looked at the two .md files with an hex editor, and I can't see any difference when I compare the sections that handle the figures:
![plot of chunk Parent](figure/Parent.png)
although obviously the accentuated characters are completely different... I have verified that the image files do exist in the figure folder

Anyway, after reading many posts I decided to set RStudio to use UTF-8 encoding. With only one level of files things work great. For example, I can -independently- knit and then pandoc into Word the following 2 Rmd files:

Parent   -   SAVED WITH utf-8 encoding in RStudio
========================================================

u with an accent: "ú"  SAVED WITH utf-8 encoding in RStudio

```{r fig.width=7, fig.height=6}
plot(cars, main='Parent ú')
```

and separately:

Child   -   SAVED WITH utf-8 encoding in RStudio
========================================================

u with an accent: "ú"  Child file

```{r fig.width=7, fig.height=6}
plot(cars, main='One File Child ú')
```

and I get both 2 perfect prevues in RStudio and 2 perfect Word documents from pandoc.

The problem arises when I try to call the child part from the parent part. In other words, if I add to the first file the following lines:

```{r CallChild, child='TestUTFChild.Rmd'}

```  

then all the accents in the child file become garbled as if the UTF-8 was beeing interpreted as ISO 8859-1. Pandoc stops reading the file as well, complaining it's not utf-8.

If anybody could point me in the right direction, either:

1. With pandoc not finding the plots if I stay with ISO 8859-1. I have also tried Windows-1252 because it's what I saw in the sessionInfo, but the result is the same.

or

2. With the call to the child file, if UTF-8 is the way to go. I have looked for a way of setting some option to force the encoding in the child call, but I haven't found it yet.

Many thanks!

like image 356
ap53 Avatar asked Oct 22 '22 15:10

ap53


1 Answers

I think this problem should be fixed in the latest development version. See instructions in the development repository on how to install the devel version. Then you should be able to choose UTF-8 in RStudio, and get a UTF-8 encoded output file.

Just in case anyone is interested in the gory details: the reason for the failure before was that I wrote the child output with the encoding you provided, but did not read it with the same encoding. Now I just avoid writing output files for child documents.

like image 97
Yihui Xie Avatar answered Oct 24 '22 10:10

Yihui Xie