Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sort not sorting as expected (space and locale)

Tags:

linux

sorting

I want to sort a text file through linux sort, that looks like this

v 1006
v10 1
v 1011

I would expect result like this:

v 1006
v 1011
v10 1

However, using sort, even with all kinds of options, the v10 1 line is still in the middle. Why? I would understand v10 1 being either on top on on the bottom (depending if space character is smaller or bigger than 1), but for what reason it is kept in the middle?

like image 777
Karel Bílek Avatar asked May 06 '11 09:05

Karel Bílek


2 Answers

It uses the system locale to determine the sorting order of letters. My guess is that with your locale, it ignores whitespace.

$ cat foo.txt 
v 1006
v10 1
v 1011
$ LC_ALL=C sort foo.txt
v 1006
v 1011
v10 1
$ LC_ALL=en_US.utf8 sort foo.txt
v 1006
v10 1
v 1011
like image 65
Tatu Lahtela Avatar answered Nov 19 '22 15:11

Tatu Lahtela


Your locale influences how the lines are sorted. For example I get this with my current locale:

% echo -e "v 1006\nv10 1\nv 1011" | sort
v 1006
v10 1
v 1011

But with C locale I get this:

% echo -e "v 1006\nv10 1\nv 1011" | LC_ALL=C sort
v 1006
v 1011
v10 1

I'm not sure why it behaves that way really. LC_ALL=C is pretty much equivalent to turning off all unexpected processing and going back to the byte-level operations (yeah - I'm skipping the details).

Why do different locale settings skip space is harder to explain though. If anyone can explain that would be good :)

like image 28
viraptor Avatar answered Nov 19 '22 15:11

viraptor