Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the sort command sorts indifferently "œ" and "oe" in French locale?

Can someone explain the behavior of the sort command with the character œ with a french locale?

$ file file.txt
file.txt: UTF-8 Unicode text, with CRLF line terminators

$ wc -l file.txt
4 file.txt

$ cat file.txt
cœz
coez
coe
cœ

$ sort file.txt
coe
cœ
cœz
coez

$ sort -d file.txt
cœ
coe
coez
cœz

$ env | grep -P "(LC|FR)"
LANG=fr_FR.UTF-8

The fact that "œ" is less or greater than "oe" seems random in the case of a regular sort, whereas the character is simply ignored in the case of a dictionary sort (sort -d).

I guess it has something to do with collation, but I'd like to have some insight here.

like image 967
Benjamin Toueg Avatar asked Apr 04 '13 14:04

Benjamin Toueg


1 Answers

Dictionary sort may be ignoring the œ ligature because it is not in the range a-zA-Z in ascii. (This is a guess).

Then in the French locale, œ and oe compare as equal, so they should come out in whatever order they went in, which is what seems to be happening to you. If that's correct, then if you put this in:

cœz
coez
cœm
coem
coep
cœp
coe
cœ

You should get this:

coe
cœ
cœm
coem
coep
cœp
cœz
coez

You can use the -c (check if file is sorted) or -r (reverse order) options to get more.

like image 110
Ben Avatar answered Nov 15 '22 07:11

Ben