Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bash sort unusual order. Problem with spaces?

Tags:

bash

sorting

Lots of time gone in to a bug, which tracks back to sort...

Can someone explain why I get this unsorted result, when the bash docs tell me the delimiter is a transition from white to non-white characters? Shouldn't the first field be sorted?

>sort myfile.txt
10_10000000 19
10_10000001 20
10_10000002 19
10_10000003 17
10_10000004 16
10_1000000 44
10_10000005 16
10_10000006 16
10_10000007 17
10_10000008 16

of course using +0 -1 gives me my expected result:

>sort +0 -1 myfile.txt
10_1000000 44
10_10000000 19
10_10000001 20
10_10000002 19
10_10000003 17
10_10000004 16
10_10000005 16
10_10000006 16
10_10000007 17
10_10000008 16

Some metainfo:

>type sort
sort is hashed (/bin/sort)

I am using

sort (GNU coreutils) 5.97


>locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
like image 565
lonestar21 Avatar asked Dec 20 '10 19:12

lonestar21


2 Answers

I think you are seeing locale-based issues. Some (many?) locales affect the way sort works, in that certain chars will be ignored. In this case, it looks like the space between the fields are being ignored when you don't specify the fields to sort. Remove the space and you can see that the row that looks like it's in the wrong place is correct.

If you run sort with a different locale you'll probably get a different result:

$ LANG=C sort myfile.txt

My default locale is en_AU.UTF-8 and I see your original sort results. When I set LANG=C, I see the results you are expecting.

like image 80
camh Avatar answered Nov 20 '22 01:11

camh


Works right for me:

$ sort myfile.txt
10_1000000 44
10_10000000 19
10_10000001 20
10_10000002 19
10_10000003 17
10_10000004 16
10_10000005 16
10_10000006 16
10_10000007 17
10_10000008 16

$ sort --version
sort (GNU coreutils) 8.5

Perhaps your version requires the -n flag to turn on numerical-sort?

like image 23
SiegeX Avatar answered Nov 20 '22 01:11

SiegeX