Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bilingual (English and Portuguese) documentation in an R package

Tags:

I am writing a package to facilitate importing Brazilian socio-economic microdata sets (Census, PNAD, etc). I foresee two distinct groups of users of the package:

  • Users in Brazil, who may feel more at ease with the documentation in Portuguese. The probably can understand English to some extent, but a foreign language would probably make the package feel less "ergonomic".

  • The broader international users community, from whom English documentation may be a necessary condition.

Is it possible to write a package in a way that the documentation is "bilingual" (English and Portuguese), and that the language shown to the user will depend on their country/language settings?

Also,

Is that doable within the roxygen2 documentation framework?

I realise there is a tradeoff of making the package more user-friendly by making it bilingual vs. the increased complexity and difficulty to maintain. General comments on this tradeoff from previous expirience are also welcome.

EDIT: following the comment's suggestion I cross-posted r-package-devel mailling list. HERE, then follow the answers at the bottom. Duncan Murdoch posted an interesting answer covering some of what @Brandons answer (bellow) covers, but also including two additional suggestions that I think are useful:

  • have the package in one language, but the vignettes for different languages. I will follow this advice.

  • have to versions of the package , let's say 1.1 and 1.2, one on each language

like image 466
LucasMation Avatar asked May 18 '16 01:05

LucasMation


1 Answers

According to Ropensci, there is no standard mechanism for translating package documentation into non-English languages. They describe the typical process of internationalization/localization as follows:

To create non-English documentation requires manual creation of supplemental .Rd files or package vignettes.

Packages supplying non-English documentation should include a Language field in the DESCRIPTION file.

And some more info on the Language field:

A ‘Language’ field can be used to indicate if the package documentation is not in English: this should be a comma-separated list of standard (not private use or grandfathered) IETF language tags as currently defined by RFC 5646 (https://www.rfc-editor.org/rfc/rfc5646, see also https://en.wikipedia.org/wiki/IETF_language_tag), i.e., use language subtags which in essence are 2-letter ISO 639-1 (https://en.wikipedia.org/wiki/ISO_639-1) or 3-letter ISO 639-3 (https://en.wikipedia.org/wiki/ISO_639-3) language codes.

Care is needed if your package contains non-ASCII text, and in particular if it is intended to be used in more than one locale. It is possible to mark the encoding used in the DESCRIPTION file and in .Rd files.

Regarding encoding...

First, consider carefully if you really need non-ASCII text. Many users of R will only be able to view correctly text in their native language group (e.g. Western European, Eastern European, Simplified Chinese) and ASCII.72. Other characters may not be rendered at all, rendered incorrectly, or cause your R code to give an error. For .Rd documentation, marking the encoding and including ASCII transliterations is likely to do a reasonable job. The set of characters which is commonly supported is wider than it used to be around 2000, but non-Latin alphabets (Greek, Russian, Georgian, …) are still often problematic and those with double-width characters (Chinese, Japanese, Korean) often need specialist fonts to render correctly.

On a related note, R does, however, provide support for "errors and warnings" in different languages - "There are mechanisms to translate the R- and C-level error and warning messages. There are only available if R is compiled with NLS support (which is requested by configure option --enable-nls, the default)."

like image 170
Brandon Loudermilk Avatar answered Sep 20 '22 22:09

Brandon Loudermilk