Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using weekdays with any locale under Windows

I'm trying to get the day of the week, and have it work consistently in any locale. In locales with Latin alphabets, everything is fine.

Sys.getlocale()
## [1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252"
weekdays(Sys.Date())
## [1] "Tuesday"

I have two related problems with other locales.

If I set

Sys.setlocale("LC_ALL", "Arabic_Qatar")
## [1] "LC_COLLATE=Arabic_Qatar.1256;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=Arabic_Qatar.1256;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"

then I sometimes (correctly) get

weekdays(Sys.Date())
## [1] "الثلاثاء

and sometimes get

weekdays(Sys.Date())
## [1] "ÇáËáÇËÇÁ"

depending upon my setup. The problem is, I can't figure out what is causing the difference.

I thought it might be something to do with getOption("encoding"), but I've tried explicitly setting options(encoding = "native.enc") and options(encoding = "UTF-8") and it makes no difference.

I've tried several recent versions of R, and the problem is consistent across all of them.

At the moment, the string displays correctly in R GUI, but incorrectly when I use an IDE (Architect and RStudio tested).

What should I set to ensure that weekdays always displays correctly?

It may be helpful to know that weekdays(Sys.Date()) is equivalent to format(as.POSIXlt(Sys.Date()), "%A"), which calls an internal format.POSIXlt method.

Secondly, it seems overkill to change all of the locale. I thought I should just be able to set the time options. However, if I set individual components of the locale, weekdays returns a string of question marks.

for(category in c("LC_TIME", "LC_CTYPE", "LC_COLLATE", "LC_MONETARY"))
{
  Sys.setlocale(category, "Arabic_Qatar")
  print(Sys.getlocale())
  print(weekdays(Sys.Date()))
}
## [1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
## [1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
## [1] "LC_COLLATE=Arabic_Qatar.1256;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
## [1] "LC_COLLATE=Arabic_Qatar.1256;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=Arabic_Qatar.1256;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"

What parts of the locale affect how the weekdays are printed?


Update: The problem seems to be Windows-related. When I run the code on a Linux box with locale "ar_QA.UTF8", the weekdays are correctly displayed.


Further update: As agstudy mentioned in his answer, setting locales under Windows is odd, since you can't just use ISO codes like "en-GB". For Windows 7/Vista/Server 2003/XP you can set a locale using setlocale language strings or National Language Support values. For Qatari Arabic, there is no setlocale language string, so we must use an NLS value. We have several choices:

Sys.setlocale("LC_TIME", "ARQ")    # the language abbreviation name
Sys.setlocale("LC_TIME", "Arabic_Qatar") # corresponding to the language/country pair "Arabic (Qatar)"
Sys.setlocale("LC_TIME", "Arabic_Qatar.1256") # explicitly including the ANSI codepage
Sys.setlocale("LC_TIME", "Arabic") # would sometimes be a possibility too, but it defaults to Saudi Arabic

So the problem isn't that R cannot support Arabic locales under Windows (though I'm not entirely convinced of the robustness of Sys.setlocale).


Desperate last ditch attempt: Trying to magically fix things by using Windows Management Instrumentation Command to change the OS locale doesn't work, since R doesn't appear to recognise the changes.

system("wmic os set locale=MS_4001") 
## Updating property(s) of '\\PC402729\ROOT\CIMV2:Win32_OperatingSystem=@'
## Property(s) update successful.
system("wmic os get locale") # same as before
like image 335
Richie Cotton Avatar asked Oct 28 '14 08:10

Richie Cotton


2 Answers

The system of naming locales is OS-specific. I recommend you to read the locales from R Installation and Administration manual for a complete explanation.

under windows :

The list of supported language is listed MSDN Language Strings. And surprisingly there is not Arabic language there. The "Language string" column contains the legal input for setting locale in R and even in the list contry /regions strings there no country spoken arabic there.

Of course you can change your locale global settings( panel setting --> region --> ..) but this will change it globally and it is not sure to get the right output without encoding problem.

under linux(ubuntu in my case):

Arabic is generally not supported by default, but is easy to set it using locale.

 locale -a                     ## to list all already supported language
 sudo locale-gen ar_QA.UTF-8   ## install it in case does not exist

under RStudio now :

 Sys.setlocale('LC_TIME','ar_QA.UTF-8')
[1] "ar_QA.UTF-8"

> format(Sys.Date(),'%A')
[1] "الثلاثاء

Note also that under R console the printing is not as pretty as in R studio because it is written from left to right not from right to left.

like image 186
agstudy Avatar answered Oct 21 '22 17:10

agstudy


The RStudio/Architect problem

This can be solved, slightly messily, by explicitly changing the encoding of the weekdays string to UTF-8.

current_codepage <- as.character(l10n_info()$codepage)
iconv(weekdays(Sys.Date()), from = current_codepage, to = "utf8")

Note that codepages only exist on Windows; l10n_info()$codepage is NULL on Linux.

The LC_TIME problem

It turns out that under Windows you have to set both the LC_CTYPE and LC_TIME locale categories, and you have to set LC_CTYPE before LC_TIME, or it won't work.


In the end, we need different implementations for different OSes.

Windows version:

get_today_windows <- function(locale = NULL)
{
  if(!is.null(locale))
  {
    lc_ctype <- Sys.getlocale("LC_CTYPE")
    lc_time <- Sys.getlocale("LC_TIME")
    on.exit(Sys.setlocale("LC_CTYPE", lc_ctype))
    on.exit(Sys.setlocale("LC_TIME", lc_time), add = TRUE)
    Sys.setlocale("LC_CTYPE", locale)
    Sys.setlocale("LC_TIME", locale)
  }
  today <- weekdays(Sys.Date())
  current_codepage <- as.character(l10n_info()$codepage)
  iconv(today, from = current_codepage, to = "utf8")
}
get_today_windows() 
## [1] "Tuesday"
get_today_windows("French_France")
## [1] "mardi"
get_today_windows("Arabic_Qatar")
## [1] "الثلاثاء"
get_today_windows("Serbian (Cyrillic)") 
## [1] "уторак"
get_today_windows("Chinese (Traditional)_Taiwan") 
## [1] "星期二"

Linux version:

get_today_linux <- function(locale = NULL)
{
  if(!is.null(locale))
  {
    lc_time <- Sys.getlocale("LC_TIME")
    on.exit(Sys.setlocale("LC_TIME", lc_time), add = TRUE)
    Sys.setlocale("LC_TIME", locale)
  }
  weekdays(Sys.Date())
}
get_today_linux() 
## [1] "Tuesday"
get_today_linux("fr_FR.utf8")
## [1] "mardi"
get_today_linux("ar_QA.utf8")
## [1] "الثلاثاء"
get_today_linux("sr_RS.utf8") 
## [1] "уторак"
get_today_linux("zh_TW.utf8") 
## [1] "週二"

Enforcing the .utf8 encoding in the locale seems important get_today_linux("zh_TW") doesn't display properly.

like image 42
Richie Cotton Avatar answered Oct 21 '22 18:10

Richie Cotton