Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Letter "y" comes after "i" when sorting alphabetically

When using function sort(x), where x is a character, the letter "y" jumps into the middle, right after letter "i":

> letters [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" [21] "u" "v" "w" "x" "y" "z"  > sort(letters) [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" [21] "t" "u" "v" "w" "x" "z" 

The reason may be that I am located in Lithuania, and this is "lithuanian-like" sorting of letters, but I need normal sorting. How do I change the sorting method back to normal inside R code?

I'm using R 2.15.2 on Win7.

like image 498
zezere Avatar asked Jan 22 '13 12:01

zezere


People also ask

Does punctuation come after letters in alphabetical order?

Normally the only punctuation marks that matter in alphabetizing are parentheses and commas, but in the case of titles with subtitles, it might make sense to promote the colon to primary importance. In that case, The Beatles: Rock Band would come first.


2 Answers

You need to change the locale that R is running in. Either do that for your entire Windows install (which seems suboptimal) or within the R sessions via:

Sys.setlocale("LC_COLLATE", "C") 

You can use any other valid locale string in place of "C" there, but that should get you back to the sort order for letters you want.

Read ?locales for more.

I suppose it is worth noting the sister function Sys.getlocale(), which queries the current setting of a locale parameter. Hence you could do

(locCol <- Sys.getlocale("LC_COLLATE")) Sys.setlocale("LC_COLLATE", "lt_LT") sort(letters) Sys.setlocale("LC_COLLATE", locCol) sort(letters) Sys.getlocale("LC_COLLATE")  ## giving: > (locCol <- Sys.getlocale("LC_COLLATE")) [1] "en_GB.UTF-8" > Sys.setlocale("LC_COLLATE", "lt_LT") [1] "lt_LT" > sort(letters)  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" [16] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "z" > Sys.setlocale("LC_COLLATE", locCol) [1] "en_GB.UTF-8" > sort(letters)  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" [16] "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" > Sys.getlocale("LC_COLLATE") [1] "en_GB.UTF-8" 

which of course is what @Hadley's Answer shows with_collate() doing somewhat more succinctly once you have devtools installed.

like image 69
Gavin Simpson Avatar answered Sep 19 '22 13:09

Gavin Simpson


If you want to do this temporarily, devtools provides the with_collate function:

library(devtools) with_collate("C", sort(letters)) # [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" # [20] "t" "u" "v" "w" "x" "y" "z" with_collate("lt_LT", sort(letters)) # [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r" # [20] "s" "t" "u" "v" "w" "x" "z" 
like image 35
hadley Avatar answered Sep 17 '22 13:09

hadley