Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a reliable way of getting allowed locale names in R?

Tags:

I'm trying to find a reliable way of finding locale codes to pass to Sys.setlocale.

The ?Sys.setlocale help page just states that the allowed values are OS dependent, and gives these examples:

Sys.setlocale("LC_TIME", "de")     # Solaris: details are OS-dependent
Sys.setlocale("LC_TIME", "de_DE.utf8")   # Modern Linux etc.
Sys.setlocale("LC_TIME", "de_DE.UTF-8")  # ditto
Sys.setlocale("LC_TIME", "de_DE")  # Mac OS X, in UTF-8
Sys.setlocale("LC_TIME", "German") # Windows

Under Linux, the possibilities can be retrieved using

locales <- system("locale -a", intern = TRUE)
##  [1] "C"                    "C.utf8"               "POSIX"               
##  [4] "af_ZA"                "af_ZA.utf8"           "am_ET"
##  ...

I don't have Solaris or Mac machines to hand, but I guess that that output can be generated from that using something like:

library(stringr)
unique(str_split_fixed(locales, "_", 2)[, 1])    #Solaris
unique(str_split_fixed(locales, "\\.", 2)[, 1])  #Mac

Locales on Windows are much more problematic: they require long names of the form “language_country”, for example:

Sys.setlocale("LC_ALL", "German_Germany")

I can't find a reliable reference for the list of locales under Windows. Calling locale -a from the Windows command line fails unless cygwin is installed, and then it returns the same values as under Linux (I'm guessing it's accessing values in a standard C library.)

There doesn't seem to be a list of locales packaged with R (I thought there might something similar to share/zoneinfo/zone.tab that contains time zone details).

My current best strategy is to browse this webpage from Microsoft and form the name by manipulating the SUBLANG column of the table.

http://msdn.microsoft.com/en-us/library/dd318693.aspx

Some guesswork is needed, for example the locale related to SUBLANG_ENGLISH_UK is English_United Kingdom.

Sys.setlocale("LC_ALL", "English_United Kingdom")

Where there are variants in different alphabets, parentheses are needed.

Sys.setlocale("LC_ALL", "Uzbek (Latin)_Uzbekistan")
Sys.setlocale("LC_ALL", "Uzbek (Cyrillic)_Uzbekistan")

This guesswork wouldn't be too bad, but many locales don't work at all, including most Indian locales.

Sys.setlocale("LC_ALL", "Hindi_India")
Sys.setlocale("LC_ALL", "Tamil_India")
Sys.setlocale("LC_ALL", "Sindhi_Pakistan")
Sys.setlocale("LC_ALL", "Nynorsk_Norway")
Sys.setlocale("LC_ALL", "Amharic_Ethiopia")

The Windows Region and Language dialog box (Windows\System32\intl.cpl, see pic) has a similar but not identical list of available locales, but I don't know where that is populated from.

enter image description here

There are several related questions:
1. Mac and Solaris people: please can you check to see if my code for getting locales works under your OS.
2. Indian/Pakistani/Norwegian/Ethiopian people using Windows: Please can you tell me what Sys.getlocale() returns for you.
3. Other Windows people: Is there any better documentation on which locales are available?

Update: After clicking links in the question that Ben B mentioned, I stumbled across this better list of locales in Windows. By manually changing the locale using the Region and Language dialog and calling Sys.getlocale(), I deduced that Nynorsk is "Norwegian-Nynorsk_Norway". There are still many oddities, for example

Sys.setlocale(, "Inuktitut (Latin)_Canada")

is fine, but

Sys.setlocale(, "Inuktitut (Syllabics)_Canada")

fails (as do most of the Indian languages). Starting R in any of these locales causes a warning, and R's locale to revert to C.

I'm still interested to hear from any Indians, etc., as to what locale you have.

like image 403
Richie Cotton Avatar asked Jan 06 '14 22:01

Richie Cotton


People also ask

What is locale in r?

The locale describes aspects of the internationalization of a program. Initially most aspects of the locale of R are set to "C" (which is the default for the C language and reflects North-American usage – also known as "POSIX" ).

What is sys setlocale?

Sys. setlocale returns a string describing the current locale after setting it to what you asked for. Sys. getlocale returns a string describing the current locale. If the category argument is "LC_ALL", it returns a string describing the locales for each of the categories.


1 Answers

In answer to your first question, here's the output on my Mac:

> locales <- system("locale -a", intern = TRUE)
> library(stringr)
> unique(str_split_fixed(locales, "\\.", 2)[, 1]) 
 [1] "af_ZA" "am_ET" "be_BY" "bg_BG" "ca_ES" "cs_CZ" "da_DK" "de_AT" "de_CH"
[10] "de_DE" "el_GR" "en_AU" "en_CA" "en_GB" "en_IE" "en_NZ" "en_US" "es_ES"
[19] "et_EE" "eu_ES" "fi_FI" "fr_BE" "fr_CA" "fr_CH" "fr_FR" "he_IL" "hi_IN"
[28] "hr_HR" "hu_HU" "hy_AM" "is_IS" "it_CH" "it_IT" "ja_JP" "kk_KZ" "ko_KR"
[37] "lt_LT" "nl_BE" "nl_NL" "no_NO" "pl_PL" "pt_BR" "pt_PT" "ro_RO" "ru_RU"
[46] "sk_SK" "sl_SI" "sr_YU" "sv_SE" "tr_TR" "uk_UA" "zh_CN" "zh_HK" "zh_TW"
[55] "C"     "POSIX"

I'm not sure what I'm expecting to see with Sys.setlocale() but it doesn't throw any errors:

> Sys.setlocale(locale="he_IL")
[1] "he_IL/he_IL/he_IL/C/he_IL/en_AU.UTF-8"
> Sys.getlocale()
[1] "he_IL/he_IL/he_IL/C/he_IL/en_AU.UTF-8"
like image 68
Scott Ritchie Avatar answered Oct 27 '22 02:10

Scott Ritchie