Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RMarkdown utf-8 error on mutliple operating systems

We have a problem using RMarkdown on multiple operating systems.

Initially, an .Rmd file is created on a Linux system (Ubuntu 12.04 LTS) and then pushed to a GitHub repo.

It can be compiled ("knitted") without problems on this system.

It is then pulled on a Windows 7 machine with RStudio installed.

There, when trying to compile, the following error shows up:

Error in yaml::yaml.load(front_matter) : 
  Reader error: invalid leading UTF-8 octet: #FC at 66
Calls: <Anonymous> -> parse_yaml_front_matter -> <Anonymous> -> .Call
Execution halted
  1. When creating another .Rmd file on the Windows system, it works flawlessly.
  2. When creating another .Rmd file on the Windows system, and copying everything but the first few lines of the "problematic" file to the other .Rmd file, and compiling this file, it works flawlessly.

I compared both files in HEX (in Sublime) on both operating systems: They are EXACTLY the same.

Has somebody else seen that error before?

Update: It seems as if a German Umlaut ("ü") is causing the problem, as its UTF-8 "Escaped Unicode" is \uFC, according to http://www.endmemo.com/unicode/unicodeconverter.php

In general, it seems that Unicode is not correctly recognized by either R, RStudio or knitr on Windows. When I type in some Umlauts in a new .Rmd file, and knit it, I get output such as "öää". In RStudio > Tools > Global options, I set the Default text encoding to "UTF-8". And I also did that for R, in the RProfile.site file (options(encoding="UTF-8")).

Update 2: library(rmarkdown); sessionInfo() gives

R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252    LC_MONETARY=German_Switzerland.1252
[4] LC_NUMERIC=C                        LC_TIME=German_Switzerland.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rmarkdown_0.4.2

loaded via a namespace (and not attached):
[1] digest_0.6.8    htmltools_0.2.6 tools_3.1.2    

on Windows 7, whereas, on Ubuntu, it is:

R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rmarkdown_0.3.10

loaded via a namespace (and not attached):
[1] digest_0.6.8    htmltools_0.2.6 tools_3.1.2   

I already suspect the problem to be the diverging locale... how do I fix this?

like image 441
grssnbchr Avatar asked Jan 16 '15 11:01

grssnbchr


1 Answers

I am extremely late to this, but I solved the issue by changing the options encoding back to "native":

options(encoding="native")

And changing the default windows encoding to UTF-8 (which opened the pandora box of a non-negligible number of other issues related to the encoding of other programs; so, treat with caution).

like image 150
Eudald Avatar answered Nov 15 '22 07:11

Eudald