Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Brew and knit one PDF report split by variable with special characters (å æ ø) - encoding issue

I try to produce one PDF report split on sections based on a grouping variable, using brew and knitr. My grouping variable may contain special characters (umlauts), such as å æ ø.

Umlauts in the document title only are handled fine with \usepackage[utf8]{inputenc} (see examples below). However, umlauts in the grouping variable generate an error with \usepackage[utf8]{inputenc}.

On the other hand, when I tried \usepackage[T1]{fontenc}, umlauts in the grouping variable are handled properly. But now the title is not correctly encoded.

I am struggling to get encoding right in both title and grouping variable.

Here is an example where I try to produce one PDF report with subsections of summary statistics per species in the iris dataset. I hope it may illustrate my problem.

R code to prepare data without umlauts

library(plyr)
library(xtable)
library(knitr)
library(brew)
library(stringr)

Create a summary table for each species in the built-in iris dataset. First, use the original Species names, without umlauts. Umlaut in document \title only (see code for the .rnw template file). Store summary tables in a list.

 data(iris)
 iris_tbl <- dlply(.data = iris, .variables = .(Species), function(x) xtable(summary(x)))

Define function brew_knit_pdf. The function brews a template latex file xxx.rnw to a new .rnw file xxx_out.rnw, which has one section for each item/group that is looped over. The xxx_out.rnw from brew is then used as an input file in knit2pdf and is converted to a PDF.

brew_knit_pdf <- function(template, ...){
  brew_out <- str_replace(string = template, pattern = ".rnw", replacement = "_out.rnw")
  brew(file = template, output = brew_out)
  knit2pdf(input = brew_out, ...)
}

brew_knit_pdf("iris_umlaut_tbl.rnw")

Code for the .rnw template file

In my example, I have named the template file for the following code iris_umlaut_tbl.rnw. This file is used as input in the brew_knit_pdf function in the R script.

\documentclass{article}

% \usepackage[T1]{fontenc}    
\usepackage[utf8]{inputenc}

\usepackage{geometry}
\geometry{tmargin=2.5cm,bmargin=2.5cm,lmargin=2.5cm,rmargin=2.5cm}

\begin{document}

\begin{titlepage}

\title{Using brew and knitr to produce one PDF report split by a grouping variable.\\Problem with å æ ø in grouping variable}

\clearpage\maketitle
\thispagestyle{empty}

\tableofcontents

\end{titlepage}
\newpage


\section{Summary statistics for each species}

% R code loop wrapped in brew syntax, which brews the template file xxx.rnw to a new .rnw file xxx_out.rnw, which has one section for each group that is looped over, i.e. the names of the list iris_tbl produced in the R script.

<% for (Sp in names(iris_tbl)) { -%>

\subsection{<%= Sp %>}
<<sum-<%= Sp %>, echo=FALSE, results='asis'>>=
print(iris_tbl[["<%= Sp %>"]])
@
\newpage
<% } %>

\end{document}

R code to prepare data with umlauts

To mimic my real data, I replace Species names in the iris data with (non-sensical) names than contains umlauts.

data(iris)
iris$Species <- as.character(iris$Species)

iris$Species[iris$Species == "setosa"] <- "åsetosa"
iris$Species[iris$Species == "versicolor"] <- "æversicolor"
iris$Species[iris$Species == "virginica"] <- "øvirginica"

# create a summary table for each species
iris_tbl <- dlply(.data = iris, .variables = .(Species), function(x) xtable(summary(x)))

When the 'umlaut version' of iris_tbl has been prepared, I run the brew_knit_pdf function on the same .rnw file as above, except that I use different encoding packages (inputenc and/or fontenc).

Result

Here is a summary of four attempts so far; using datasets without or with umlauts, and using different encoding packages in the .rnw file.

    • The R data: iris_tbl prepared with non-umlaut Species
    • The .rnw file: umlauts in \title{ }, \usepackage[utf8]{inputenc}

Output umlauts in title OK

    • The R data: iris_tbl prepared with umlaut version of Species
    • The .rnw file: umlauts in \title{ }, \usepackage[utf8]{inputenc}

Output

Error: running 'texi2dvi' on 'iris_umlaut_tbl_out.tex' failed LaTeX errors: ...Package inputenc Error: Unicode char \u8:æve not set up for use with LaTeX.

    • The R data: iris_tbl prepared with umlaut version of Species
    • The .rnw file: umlauts in \title{ }, \usepackage[T1]{fontenc}, \usepackage[utf8]{inputenc}

Output

Error: running 'texi2dvi' on 'iris_umlaut_tbl_out.tex' failed LaTeX errors: ...Package inputenc Error: Unicode char \u8:æve not set up for use with LaTeX.

    • The R data: iris_tbl prepared with umlaut version of Species
    • The .rnw file: umlauts in \title{ }, \usepackage[T1]{fontenc}

Output

umlauts in title not OK, umlauts in grouping variable OK


Can anyone point me in the right direction to get the encoding right in both title and grouping variable? Thanks a lot in advance for taking your time.


Session info

Default text encoding in my R Studio 0.97.336: UTF-8

> sessionInfo()

R version 3.0.0 (2013-04-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Norwegian (Bokmål)_Norway.1252  LC_CTYPE=Norwegian (Bokmål)_Norway.1252   
[3] LC_MONETARY=Norwegian (Bokmål)_Norway.1252 LC_NUMERIC=C                              
[5] LC_TIME=Norwegian (Bokmål)_Norway.1252    

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Hmisc_3.10-1               survival_2.37-4            pastecs_1.3-13             boot_1.3-9                
 [5] pspline_1.0-15             ggplot2_0.9.3.1            lubridate_1.2.0            stringr_0.6.2             
 [9] brew_1.0-6                 knitr_1.1                  xtable_1.7-1               plyr_1.8                  
[13] PerformanceAnalytics_1.1.0 xts_0.9-3                  zoo_1.7-9                  gdata_2.12.0.2            

loaded via a namespace (and not attached):
 [1] cluster_1.14.4     colorspace_1.2-2   dichromat_2.0-0    digest_0.6.3       evaluate_0.4.3     formatR_0.7       
 [7] grid_3.0.0         gtable_0.1.2       gtools_2.7.1       labeling_0.1       lattice_0.20-15    MASS_7.3-26       
[13] memoise_0.1        munsell_0.4        proto_0.3-10       RColorBrewer_1.0-5 reshape2_1.2.2     scales_0.2.3      
[19] tools_3.0.0

> getOption("encoding")

[1] "native.enc"

Update:

I am very grateful for an 'off-SO' input from the brew package maintainer Jeffrey Horner. He had no encoding problems when running my script with Ubuntu and command-line R. This gave me some renewed hope. I have no opportunity to run Ubuntu myself, but today I updated RStudio (0.97.449) and set the default encoding to ISO8859-1 (thanks Yihui!). Now the special characters are encoded correctly both in the title and in the grouping variable with \usepackage[latin1]{inputenc} in the .rnw file. Also \usepackage[ansinew]{inputenc} works. I am not sure what went wrong in my original attempt. Possibly RStudio did not apply the default encoding set in Options, which I changed following Yihui's advice, to the script files when I re-opened them. But that's just a speculation.

like image 969
Henrik Avatar asked Apr 23 '13 15:04

Henrik


1 Answers

Since you are using UTF-8, which is not the native encoding of your OS, you need to explicitly tell knitr the encoding of your input document. For example, you have to call

knit2pdf(brew_out, encoding = "UTF-8")

But I'm not sure if brew can handle non-native character encodings. If not, I suggest you use your system default encoding (should be ISO8859-1 in this case), and

\usepackage[latin9]{inputenc}

Or do everything in knitr if you have to use UTF-8 (this also enables you to click the button to compile the document); see 075-knit-expand.Rnw for an example.

like image 192
Yihui Xie Avatar answered Sep 30 '22 15:09

Yihui Xie