Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 support in R on Windows

Since new function 'Beta: Use Unicode UTF-8 for worldwide language support' is added on Windows10, I thought it is possible for R to convert locale environment to UTF-8. However, when I try to change system locale to UTF-8 by

Sys.setlocale(locale = "Japanese_Japan.65001") 

or

Sys.setlocale(locale = "Japanese_Japan.UTF-8") 

I get

In Sys.setlocale("Japanese_Japan.65001") :
OS reports request to set locale to "Japanese_Japan.65001" cannot be honored

For now, does Windows allow R to use UTF-8?

(Because I am not very familiar with locale problem, I welcome comments if there should be more information.)

infomation

> Sys.getlocale()
[1] "LC_COLLATE=Japanese_Japan.932;LC_CTYPE=Japanese_Japan.932;LC_MONETARY=Japanese_Japan.932;LC_NUMERIC=C;LC_TIME=Japanese_Japan.932"
like image 985
tragoat Avatar asked Sep 02 '25 09:09

tragoat


1 Answers

UPDATE: The (upcoming) R 4.2.0 should fully support UTF-8 on Windows: https://developer.r-project.org/Blog/public/2021/12/07/upcoming-changes-in-r-4.2-on-windows/


It appears that R has built experimental binaries that fully support UTF-8 on Windows 10, but since the project was marked as "experimental" as of 2020-07-30 and the official conclusion was:

Based also on this experience, I believe that switching to UCRT is already possible and I expect that building a complete toolchain should take a small number of months. It is I think the only realistic way to support Unicode characters (not representable in native encoding) reliably in R on Windows.

It clearly means that full UTF-8 support in R on Windows is still a plan for a bit more distant future.

Source: https://developer.r-project.org/Blog/public/2020/07/30/windows/utf-8-build-of-r-and-cran-packages/index.html

like image 110
Martin Modrák Avatar answered Sep 05 '25 00:09

Martin Modrák