I've faced a strange behavior of c()
with R 3.3.2 on Windows with non-US-English locale. It converts the names of named vectors into UTF-8.
x <- "φ"
names(x) <- "φ"
Encoding(names(x))
#> [1] "unknown"
Encoding(names(c(x)))
#> [1] "UTF-8"
Thought this issue is not problematic for most people, it is critical for those who uses named vectors as lookup tables (example is here: http://adv-r.had.co.nz/Subsetting.html#applications). I am also the one who stuck with the behavior of dplyr's select() function.
I'm not quite sure whether this behavior is a bug or by design. Should I submit a bug report to R core?
Here's info about my R environment:
sessionInfo()
#> R version 3.3.2 (2016-10-31)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows >= 8 x64 (build 9200)
#>
#> locale:
#> [1] LC_COLLATE=Japanese_Japan.932 LC_CTYPE=Japanese_Japan.932 LC_MONETARY=Japanese_Japan.932
#> [4] LC_NUMERIC=C LC_TIME=Japanese_Japan.932
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] tools_3.3.2
You should still see names(c(x)) == names(x)
on your system. The encoding change by c()
may be unintentional, but shouldn't affect your code in most scenarios.
On Windows, which doesn't have a UTF-8 locale, your safest bet is to convert all strings to UTF-8 first via enc2utf8()
, and then stay in UTF-8. This will also enable safe lookups.
Language symbols (as used in dplyr's group_by()
) are an entirely different issue. For some reason they are always interpreted in the native encoding. (Try as.name(names(c(x)))
.) However, it's still best to have them in UTF-8, and convert to native just before calling as.name()
. This is what dplyr should be doing, we're just not quite there yet.
My recommendation is to use ASCII-only characters for column names when using dplyr on Windows. This requires some discipline if you're relying on tidyr::spread()
for non-ASCII column contents. You could also consider switching to a system (OS X or Linux) that works with UTF-8 natively.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With