I'm trying to get the day of the week, and have it work consistently in any locale. In locales with Latin alphabets, everything is fine.
Sys.getlocale()
## [1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252"
weekdays(Sys.Date())
## [1] "Tuesday"
I have two related problems with other locales.
If I set
Sys.setlocale("LC_ALL", "Arabic_Qatar")
## [1] "LC_COLLATE=Arabic_Qatar.1256;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=Arabic_Qatar.1256;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
then I sometimes (correctly) get
weekdays(Sys.Date())
## [1] "الثلاثاء
and sometimes get
weekdays(Sys.Date())
## [1] "ÇáËáÇËÇÁ"
depending upon my setup. The problem is, I can't figure out what is causing the difference.
I thought it might be something to do with getOption("encoding")
, but I've tried explicitly setting options(encoding = "native.enc")
and options(encoding = "UTF-8")
and it makes no difference.
I've tried several recent versions of R, and the problem is consistent across all of them.
At the moment, the string displays correctly in R GUI, but incorrectly when I use an IDE (Architect and RStudio tested).
What should I set to ensure that weekdays always displays correctly?
It may be helpful to know that weekdays(Sys.Date())
is equivalent to format(as.POSIXlt(Sys.Date()), "%A")
, which calls an internal format.POSIXlt
method.
Secondly, it seems overkill to change all of the locale. I thought I should just be able to set the time options. However, if I set individual components of the locale, weekdays
returns a string of question marks.
for(category in c("LC_TIME", "LC_CTYPE", "LC_COLLATE", "LC_MONETARY"))
{
Sys.setlocale(category, "Arabic_Qatar")
print(Sys.getlocale())
print(weekdays(Sys.Date()))
}
## [1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
## [1] "LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
## [1] "LC_COLLATE=Arabic_Qatar.1256;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
## [1] "LC_COLLATE=Arabic_Qatar.1256;LC_CTYPE=Arabic_Qatar.1256;LC_MONETARY=Arabic_Qatar.1256;LC_NUMERIC=C;LC_TIME=Arabic_Qatar.1256"
## [1] "????????"
What parts of the locale affect how the weekdays are printed?
Update: The problem seems to be Windows-related. When I run the code on a Linux box with locale "ar_QA.UTF8"
, the weekdays are correctly displayed.
Further update: As agstudy mentioned in his answer, setting locales under Windows is odd, since you can't just use ISO codes like "en-GB". For Windows 7/Vista/Server 2003/XP you can set a locale using setlocale language strings or National Language Support values. For Qatari Arabic, there is no setlocale language string, so we must use an NLS value. We have several choices:
Sys.setlocale("LC_TIME", "ARQ") # the language abbreviation name
Sys.setlocale("LC_TIME", "Arabic_Qatar") # corresponding to the language/country pair "Arabic (Qatar)"
Sys.setlocale("LC_TIME", "Arabic_Qatar.1256") # explicitly including the ANSI codepage
Sys.setlocale("LC_TIME", "Arabic") # would sometimes be a possibility too, but it defaults to Saudi Arabic
So the problem isn't that R cannot support Arabic locales under Windows (though I'm not entirely convinced of the robustness of Sys.setlocale
).
Desperate last ditch attempt: Trying to magically fix things by using Windows Management Instrumentation Command to change the OS locale doesn't work, since R doesn't appear to recognise the changes.
system("wmic os set locale=MS_4001")
## Updating property(s) of '\\PC402729\ROOT\CIMV2:Win32_OperatingSystem=@'
## Property(s) update successful.
system("wmic os get locale") # same as before
The system of naming locales is OS-specific. I recommend you to read the locales from R Installation and Administration manual for a complete explanation.
The list of supported language is listed MSDN Language Strings. And surprisingly there is not Arabic language there. The "Language string" column contains the legal input for setting locale in R and even in the list contry /regions strings there no country spoken arabic there.
Of course you can change your locale global settings( panel setting --> region --> ..) but this will change it globally and it is not sure to get the right output without encoding problem.
Arabic is generally not supported by default, but is easy to set it using locale
.
locale -a ## to list all already supported language
sudo locale-gen ar_QA.UTF-8 ## install it in case does not exist
under RStudio now :
Sys.setlocale('LC_TIME','ar_QA.UTF-8')
[1] "ar_QA.UTF-8"
> format(Sys.Date(),'%A')
[1] "الثلاثاء
Note also that under R console the printing is not as pretty as in R studio because it is written from left to right not from right to left.
The RStudio/Architect problem
This can be solved, slightly messily, by explicitly changing the encoding of the weekdays string to UTF-8.
current_codepage <- as.character(l10n_info()$codepage)
iconv(weekdays(Sys.Date()), from = current_codepage, to = "utf8")
Note that codepages only exist on Windows; l10n_info()$codepage
is NULL
on Linux.
The LC_TIME problem
It turns out that under Windows you have to set both the LC_CTYPE
and LC_TIME
locale categories, and you have to set LC_CTYPE
before LC_TIME
, or it won't work.
In the end, we need different implementations for different OSes.
Windows version:
get_today_windows <- function(locale = NULL)
{
if(!is.null(locale))
{
lc_ctype <- Sys.getlocale("LC_CTYPE")
lc_time <- Sys.getlocale("LC_TIME")
on.exit(Sys.setlocale("LC_CTYPE", lc_ctype))
on.exit(Sys.setlocale("LC_TIME", lc_time), add = TRUE)
Sys.setlocale("LC_CTYPE", locale)
Sys.setlocale("LC_TIME", locale)
}
today <- weekdays(Sys.Date())
current_codepage <- as.character(l10n_info()$codepage)
iconv(today, from = current_codepage, to = "utf8")
}
get_today_windows()
## [1] "Tuesday"
get_today_windows("French_France")
## [1] "mardi"
get_today_windows("Arabic_Qatar")
## [1] "الثلاثاء"
get_today_windows("Serbian (Cyrillic)")
## [1] "уторак"
get_today_windows("Chinese (Traditional)_Taiwan")
## [1] "星期二"
Linux version:
get_today_linux <- function(locale = NULL)
{
if(!is.null(locale))
{
lc_time <- Sys.getlocale("LC_TIME")
on.exit(Sys.setlocale("LC_TIME", lc_time), add = TRUE)
Sys.setlocale("LC_TIME", locale)
}
weekdays(Sys.Date())
}
get_today_linux()
## [1] "Tuesday"
get_today_linux("fr_FR.utf8")
## [1] "mardi"
get_today_linux("ar_QA.utf8")
## [1] "الثلاثاء"
get_today_linux("sr_RS.utf8")
## [1] "уторак"
get_today_linux("zh_TW.utf8")
## [1] "週二"
Enforcing the .utf8
encoding in the locale seems important get_today_linux("zh_TW")
doesn't display properly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With