I'm trying to find a reliable way of finding locale codes to pass to Sys.setlocale
.
The ?Sys.setlocale
help page just states that the allowed values are OS dependent, and gives these examples:
Sys.setlocale("LC_TIME", "de") # Solaris: details are OS-dependent
Sys.setlocale("LC_TIME", "de_DE.utf8") # Modern Linux etc.
Sys.setlocale("LC_TIME", "de_DE.UTF-8") # ditto
Sys.setlocale("LC_TIME", "de_DE") # Mac OS X, in UTF-8
Sys.setlocale("LC_TIME", "German") # Windows
Under Linux, the possibilities can be retrieved using
locales <- system("locale -a", intern = TRUE)
## [1] "C" "C.utf8" "POSIX"
## [4] "af_ZA" "af_ZA.utf8" "am_ET"
## ...
I don't have Solaris or Mac machines to hand, but I guess that that output can be generated from that using something like:
library(stringr)
unique(str_split_fixed(locales, "_", 2)[, 1]) #Solaris
unique(str_split_fixed(locales, "\\.", 2)[, 1]) #Mac
Locales on Windows are much more problematic: they require long names of the form “language_country”, for example:
Sys.setlocale("LC_ALL", "German_Germany")
I can't find a reliable reference for the list of locales under Windows. Calling locale -a
from the Windows command line fails unless cygwin is installed, and then it returns the same values as under Linux (I'm guessing it's accessing values in a standard C library.)
There doesn't seem to be a list of locales packaged with R (I thought there might something similar to share/zoneinfo/zone.tab
that contains time zone details).
My current best strategy is to browse this webpage from Microsoft and form the name by manipulating the SUBLANG
column of the table.
http://msdn.microsoft.com/en-us/library/dd318693.aspx
Some guesswork is needed, for example the locale related to SUBLANG_ENGLISH_UK
is English_United Kingdom
.
Sys.setlocale("LC_ALL", "English_United Kingdom")
Where there are variants in different alphabets, parentheses are needed.
Sys.setlocale("LC_ALL", "Uzbek (Latin)_Uzbekistan")
Sys.setlocale("LC_ALL", "Uzbek (Cyrillic)_Uzbekistan")
This guesswork wouldn't be too bad, but many locales don't work at all, including most Indian locales.
Sys.setlocale("LC_ALL", "Hindi_India")
Sys.setlocale("LC_ALL", "Tamil_India")
Sys.setlocale("LC_ALL", "Sindhi_Pakistan")
Sys.setlocale("LC_ALL", "Nynorsk_Norway")
Sys.setlocale("LC_ALL", "Amharic_Ethiopia")
The Windows Region and Language dialog box (Windows\System32\intl.cpl
, see pic) has a similar but not identical list of available locales, but I don't know where that is populated from.
There are several related questions:
1. Mac and Solaris people: please can you check to see if my code for getting locales works under your OS.
2. Indian/Pakistani/Norwegian/Ethiopian people using Windows: Please can you tell me what Sys.getlocale()
returns for you.
3. Other Windows people: Is there any better documentation on which locales are available?
Update: After clicking links in the question that Ben B mentioned, I stumbled across this better list of locales in Windows. By manually changing the locale using the Region and Language dialog and calling Sys.getlocale()
, I deduced that Nynorsk is "Norwegian-Nynorsk_Norway". There are still many oddities, for example
Sys.setlocale(, "Inuktitut (Latin)_Canada")
is fine, but
Sys.setlocale(, "Inuktitut (Syllabics)_Canada")
fails (as do most of the Indian languages). Starting R in any of these locales causes a warning, and R's locale to revert to C
.
I'm still interested to hear from any Indians, etc., as to what locale you have.
The locale describes aspects of the internationalization of a program. Initially most aspects of the locale of R are set to "C" (which is the default for the C language and reflects North-American usage – also known as "POSIX" ).
Sys. setlocale returns a string describing the current locale after setting it to what you asked for. Sys. getlocale returns a string describing the current locale. If the category argument is "LC_ALL", it returns a string describing the locales for each of the categories.
In answer to your first question, here's the output on my Mac:
> locales <- system("locale -a", intern = TRUE)
> library(stringr)
> unique(str_split_fixed(locales, "\\.", 2)[, 1])
[1] "af_ZA" "am_ET" "be_BY" "bg_BG" "ca_ES" "cs_CZ" "da_DK" "de_AT" "de_CH"
[10] "de_DE" "el_GR" "en_AU" "en_CA" "en_GB" "en_IE" "en_NZ" "en_US" "es_ES"
[19] "et_EE" "eu_ES" "fi_FI" "fr_BE" "fr_CA" "fr_CH" "fr_FR" "he_IL" "hi_IN"
[28] "hr_HR" "hu_HU" "hy_AM" "is_IS" "it_CH" "it_IT" "ja_JP" "kk_KZ" "ko_KR"
[37] "lt_LT" "nl_BE" "nl_NL" "no_NO" "pl_PL" "pt_BR" "pt_PT" "ro_RO" "ru_RU"
[46] "sk_SK" "sl_SI" "sr_YU" "sv_SE" "tr_TR" "uk_UA" "zh_CN" "zh_HK" "zh_TW"
[55] "C" "POSIX"
I'm not sure what I'm expecting to see with Sys.setlocale()
but it doesn't throw any errors:
> Sys.setlocale(locale="he_IL")
[1] "he_IL/he_IL/he_IL/C/he_IL/en_AU.UTF-8"
> Sys.getlocale()
[1] "he_IL/he_IL/he_IL/C/he_IL/en_AU.UTF-8"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With