Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Brew and knit one PDF report split by variable with special characters (å æ ø) - encoding issue

I try to produce one PDF report split on sections based on a grouping variable, using brew and knitr. My grouping variable may contain special characters (umlauts), such as å æ ø.

Umlauts in the document title only are handled fine with \usepackage[utf8]{inputenc} (see examples below). However, umlauts in the grouping variable generate an error with \usepackage[utf8]{inputenc}.

On the other hand, when I tried \usepackage[T1]{fontenc}, umlauts in the grouping variable are handled properly. But now the title is not correctly encoded.

I am struggling to get encoding right in both title and grouping variable.

Here is an example where I try to produce one PDF report with subsections of summary statistics per species in the iris dataset. I hope it may illustrate my problem.

R code to prepare data without umlauts


Create a summary table for each species in the built-in iris dataset. First, use the original Species names, without umlauts. Umlaut in document \title only (see code for the .rnw template file). Store summary tables in a list.

 iris_tbl <- dlply(.data = iris, .variables = .(Species), function(x) xtable(summary(x)))

Define function brew_knit_pdf. The function brews a template latex file xxx.rnw to a new .rnw file xxx_out.rnw, which has one section for each item/group that is looped over. The xxx_out.rnw from brew is then used as an input file in knit2pdf and is converted to a PDF.

brew_knit_pdf <- function(template, ...){
  brew_out <- str_replace(string = template, pattern = ".rnw", replacement = "_out.rnw")
  brew(file = template, output = brew_out)
  knit2pdf(input = brew_out, ...)


Code for the .rnw template file

In my example, I have named the template file for the following code iris_umlaut_tbl.rnw. This file is used as input in the brew_knit_pdf function in the R script.


% \usepackage[T1]{fontenc}    




\title{Using brew and knitr to produce one PDF report split by a grouping variable.\\Problem with å æ ø in grouping variable}




\section{Summary statistics for each species}

% R code loop wrapped in brew syntax, which brews the template file xxx.rnw to a new .rnw file xxx_out.rnw, which has one section for each group that is looped over, i.e. the names of the list iris_tbl produced in the R script.

<% for (Sp in names(iris_tbl)) { -%>

\subsection{<%= Sp %>}
<<sum-<%= Sp %>, echo=FALSE, results='asis'>>=
print(iris_tbl[["<%= Sp %>"]])
<% } %>


R code to prepare data with umlauts

To mimic my real data, I replace Species names in the iris data with (non-sensical) names than contains umlauts.

iris$Species <- as.character(iris$Species)

iris$Species[iris$Species == "setosa"] <- "åsetosa"
iris$Species[iris$Species == "versicolor"] <- "æversicolor"
iris$Species[iris$Species == "virginica"] <- "øvirginica"

# create a summary table for each species
iris_tbl <- dlply(.data = iris, .variables = .(Species), function(x) xtable(summary(x)))

When the 'umlaut version' of iris_tbl has been prepared, I run the brew_knit_pdf function on the same .rnw file as above, except that I use different encoding packages (inputenc and/or fontenc).


Here is a summary of four attempts so far; using datasets without or with umlauts, and using different encoding packages in the .rnw file.

    • The R data: iris_tbl prepared with non-umlaut Species
    • The .rnw file: umlauts in \title{ }, \usepackage[utf8]{inputenc}

Output umlauts in title OK

    • The R data: iris_tbl prepared with umlaut version of Species
    • The .rnw file: umlauts in \title{ }, \usepackage[utf8]{inputenc}


Error: running 'texi2dvi' on 'iris_umlaut_tbl_out.tex' failed LaTeX errors: ...Package inputenc Error: Unicode char \u8:æve not set up for use with LaTeX.

    • The R data: iris_tbl prepared with umlaut version of Species
    • The .rnw file: umlauts in \title{ }, \usepackage[T1]{fontenc}, \usepackage[utf8]{inputenc}


Error: running 'texi2dvi' on 'iris_umlaut_tbl_out.tex' failed LaTeX errors: ...Package inputenc Error: Unicode char \u8:æve not set up for use with LaTeX.

    • The R data: iris_tbl prepared with umlaut version of Species
    • The .rnw file: umlauts in \title{ }, \usepackage[T1]{fontenc}


umlauts in title not OK, umlauts in grouping variable OK

Can anyone point me in the right direction to get the encoding right in both title and grouping variable? Thanks a lot in advance for taking your time.

Session info

Default text encoding in my R Studio 0.97.336: UTF-8

> sessionInfo()

R version 3.0.0 (2013-04-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)

[1] LC_COLLATE=Norwegian (Bokmål)_Norway.1252  LC_CTYPE=Norwegian (Bokmål)_Norway.1252   
[3] LC_MONETARY=Norwegian (Bokmål)_Norway.1252 LC_NUMERIC=C                              
[5] LC_TIME=Norwegian (Bokmål)_Norway.1252    

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Hmisc_3.10-1               survival_2.37-4            pastecs_1.3-13             boot_1.3-9                
 [5] pspline_1.0-15             ggplot2_0.9.3.1            lubridate_1.2.0            stringr_0.6.2             
 [9] brew_1.0-6                 knitr_1.1                  xtable_1.7-1               plyr_1.8                  
[13] PerformanceAnalytics_1.1.0 xts_0.9-3                  zoo_1.7-9                  gdata_2.12.0.2            

loaded via a namespace (and not attached):
 [1] cluster_1.14.4     colorspace_1.2-2   dichromat_2.0-0    digest_0.6.3       evaluate_0.4.3     formatR_0.7       
 [7] grid_3.0.0         gtable_0.1.2       gtools_2.7.1       labeling_0.1       lattice_0.20-15    MASS_7.3-26       
[13] memoise_0.1        munsell_0.4        proto_0.3-10       RColorBrewer_1.0-5 reshape2_1.2.2     scales_0.2.3      
[19] tools_3.0.0

> getOption("encoding")

[1] "native.enc"


I am very grateful for an 'off-SO' input from the brew package maintainer Jeffrey Horner. He had no encoding problems when running my script with Ubuntu and command-line R. This gave me some renewed hope. I have no opportunity to run Ubuntu myself, but today I updated RStudio (0.97.449) and set the default encoding to ISO8859-1 (thanks Yihui!). Now the special characters are encoded correctly both in the title and in the grouping variable with \usepackage[latin1]{inputenc} in the .rnw file. Also \usepackage[ansinew]{inputenc} works. I am not sure what went wrong in my original attempt. Possibly RStudio did not apply the default encoding set in Options, which I changed following Yihui's advice, to the script files when I re-opened them. But that's just a speculation.

like image 969
Henrik Avatar asked Apr 23 '13 15:04


1 Answers

Since you are using UTF-8, which is not the native encoding of your OS, you need to explicitly tell knitr the encoding of your input document. For example, you have to call

knit2pdf(brew_out, encoding = "UTF-8")

But I'm not sure if brew can handle non-native character encodings. If not, I suggest you use your system default encoding (should be ISO8859-1 in this case), and


Or do everything in knitr if you have to use UTF-8 (this also enables you to click the button to compile the document); see 075-knit-expand.Rnw for an example.

like image 192
Yihui Xie Avatar answered Sep 30 '22 15:09

Yihui Xie