I want to sort a text file through linux sort
, that looks like this
v 1006
v10 1
v 1011
I would expect result like this:
v 1006
v 1011
v10 1
However, using sort
, even with all kinds of options, the v10 1
line is still in the middle. Why? I would understand v10 1
being either on top on on the bottom (depending if space character is smaller or bigger than 1
), but for what reason it is kept in the middle?
It uses the system locale to determine the sorting order of letters. My guess is that with your locale, it ignores whitespace.
$ cat foo.txt
v 1006
v10 1
v 1011
$ LC_ALL=C sort foo.txt
v 1006
v 1011
v10 1
$ LC_ALL=en_US.utf8 sort foo.txt
v 1006
v10 1
v 1011
Your locale influences how the lines are sorted. For example I get this with my current locale:
% echo -e "v 1006\nv10 1\nv 1011" | sort
v 1006
v10 1
v 1011
But with C locale I get this:
% echo -e "v 1006\nv10 1\nv 1011" | LC_ALL=C sort
v 1006
v 1011
v10 1
I'm not sure why it behaves that way really. LC_ALL=C
is pretty much equivalent to turning off all unexpected processing and going back to the byte-level operations (yeah - I'm skipping the details).
Why do different locale settings skip space is harder to explain though. If anyone can explain that would be good :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With