Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sorting with multiple keys with Linux sort command

Say I have this file.

$ cat a.txt
c 1002 4
f 1001 1
d 1003 1
a 1001 3
e 1004 2
b 1001 2

I want to sort it by the second column and then by the third column. Column two are numbers, while column 3 can be treated as string. I know the following command works well.

$ sort -k2,2n -k3,3 a.txt
f 1001 1
b 1001 2
a 1001 3
c 1002 4
d 1003 1
e 1004 2

However, I think sort -k2n a.txt should also work, while it does not.

$ sort -k2n a.txt
a 1001 3
b 1001 2
f 1001 1
c 1002 4
d 1003 1
e 1004 2

Seems like it sorts by column two, and then by column one instead of column three. Why is this happening? Is it a bug or not? Cause sort -k2 a.txt works ok with above data since those numbers are just fixed width.

My sort version is sort (GNU coreutils) 8.15 in cygwin.

like image 406
yejinxin Avatar asked Jun 08 '13 10:06

yejinxin


Video Answer


1 Answers

I find this caution in the GNU sort docs.

Sort numerically on the second field and resolve ties by sorting alphabetically on the third and fourth characters of field five. Use ‘:’ as the field delimiter.

      sort -t : -k 2,2n -k 5.3,5.4

Note that if you had written -k 2n instead of -k 2,2n sort would have used all characters beginning in the second field and extending to the end of the line as the primary numeric key. For the large majority of applications, treating keys spanning more than one field as numeric will not do what you expect.

I'm not sure what it ends up with when it evaluates '1001 3' as a numeric key, but "will not do what you expect" is accurate. It seems clear that the Right Thing to do is to specify each key independently.

The same web page says this about resolving "ties".

Finally, as a last resort when all keys compare equal, sort compares entire lines as if no ordering options other than --reverse (-r) were specified.

I'll confess I'm a little mystified about how to interpret that.

like image 92
Mike Sherrill 'Cat Recall' Avatar answered Sep 22 '22 21:09

Mike Sherrill 'Cat Recall'