Is it possible to write documentation in R using non-ASCII characters (such as å, ä, ö) using roxygen2? I'm asking because I am writing an package with internal functions in Swedish.
I have use the following code using roxygen to write documentation:
#' @param data data frame där variablen finns
#' @param x variabeln, måste vara en av typen character
This results in the non-ASCII characters being distorted. I can change the .Rd files manually but I'd rather not.
This is easily done on a Windows platform: type the decimal ascii code (on the numeric keypad only) while holding down the ALT key, and the corresponding character is entered. For example, Alt-132 gives you a lowercase "a" with an umlaut.
Non-ASCII characters are those that are not encoded in ASCII, such as Unicode, EBCDIC, etc. ASCII is limited to 128 characters and was initially developed for the English language. In this tutorial, we'll look at some tools to find and highlight non-ASCII characters within text files.
The Roxygen2 formatThe first line will be the title for the function (here “Illustration of crayon colors”). Include a blank #' line and then write a longer description. (“Creates a plot of the crayon colors in …”). The line with @return contains a description of what the function returns.
I solved this problem by putting
##' @encoding UTF-8
in the roxygen2 documentation comment and then typing
options(encoding = "UTF-8")
in the R console before roxygenizing. For future sessions, it is helpful to add the line
options(encoding = "UTF-8")
in the R/etc/Rprofile.site
file.
On Windows, encoding sucks in R, and is very complicated - and those developing packages don't always consider it as a real issue (see roxygen or devtools). What worked for me:
if you have data in your package with non-ASCII labels, e.g. a colorvector c(rød = "#C30000", blå = "#00A9E0"), you have to escape the names/values in code:
c(r\u00f8d = "#C30000", bl\u00e5 = "#00A9E0")
in the documentation (if you use roxygenize or devtools::document()) you have to place @encoding UTF-8 before EVERY function description but then use regular keyboard.
If you have two functions in the same file (e.g. "palette" and "saturation" in a design package for your organisation), you have to place the tag in every description block, not just once.
Example:
#' @encoding UTF-8
#' datastruktur for å definere firmapalett med æøå
dummypalett <- structure(.Data = c("#c30000", "#00A9E0"),
names = c("r\u00f8d", "bl\u00e5"))
#' @encoding UTF-8
#' neste funksjon som er beskrevet med æøåäö
For good measure, I placed Language: nob in the DESCRIPTION file and changed the encoding tag in Rprofile to "UTF-8".
Non-ASCII characters are tricky to use with R
(https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Package-subdirectories).
Only ASCII characters (and the control characters tab, formfeed, LF and CR) should be used in code files. Other characters are accepted in comments13, but then the comments may not be readable in e.g. a UTF-8 locale. Non-ASCII characters in object names will normally14 fail when the package is installed. Any byte will be allowed in a quoted character string but \uxxxx escapes should be used for non-ASCII characters. However, non-ASCII character strings may not be usable in some locales and may display incorrectly in others.
For documentation you have to add the tag @encoding UTF-8
to your roxygen2 code.
You can check whether \uxxxx
escapes have been successfully employed by the tag using the following.
path <- "path to Rd file"
tools::checkRd(path)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With