Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting pos/neg numbers with fractional parts using Unix sort

Using sort (coreutils) 5.2.1

I have the following file, which I'd like to sort by the non-integer part of field 4. This can be a negative or positive number, and might also have the value INF.

field1 field2 field3 tag=INF field5 field6
field1 field2 field3 tag=INF field5 field6
field1 field2 field3 tag=0.123 field5 field6
field1 field2 field3 tag=4.22 field5 field6
field1 field2 field3 tag=5.77 field5 field6
field1 field2 field3 tag=-1.92 field5 field6
field1 field2 field3 tag=-1.91 field5 field6
field1 field2 field3 tag=INF field5 field6

I would like this to be sorted as

field1 field2 field3 tag=-1.92 field5 field6
field1 field2 field3 tag=-1.91 field5 field6
field1 field2 field3 tag=0.123 field5 field6
field1 field2 field3 tag=4.22 field5 field6
field1 field2 field3 tag=5.77 field5 field6
field1 field2 field3 tag=INF field5 field6
field1 field2 field3 tag=INF field5 field6
field1 field2 field3 tag=INF field5 field6

Given that the number part of the field is at character position 4 (assuming the indexing starts at 0, and I'm not sure of this), I have tried sort with the following options:

  • sort -g -k4.4 inputfile
  • sort -g -k4.5 inputfile
  • sort -n -k4.4 inputfile
  • sort -n -k4.5 inputfile
  • sort -g inputfile

These all yield the following, which is close, but not quite right. The magnitudes are sorted correctly, but I'd like the most negative value on top.

field1 field2 field3 tag=0.123 field5 field6
field1 field2 field3 tag=-1.91 field5 field6
field1 field2 field3 tag=-1.92 field5 field6
field1 field2 field3 tag=4.22 field5 field6
field1 field2 field3 tag=5.77 field5 field6
field1 field2 field3 tag=INF field5 field6
field1 field2 field3 tag=INF field5 field6
field1 field2 field3 tag=INF field5 field6

How can I make sort behave?

FWIW, here's more information:

LANG = en_US.UTF-8
Red Hat Enterprise Linux WS release 4 (Nahant Update 6)
like image 247
tomocafe Avatar asked Nov 01 '22 12:11

tomocafe


2 Answers

I am on a Mac, so it may be a slightly different implementation, but I found this to work:

sort -gb -k 4.5,4 inputfile

In English: "sort, in a -general numeric fashion, ignoring -blanks, the file inputfile using the 4th -k(c)olumn's data, from the 5th element in that column to the end of the data in the 4th column"

field1 field2 field3 tag=-1.92 field5 field6
field1 field2 field3 tag=-1.91 field5 field6
field1 field2 field3 tag=0.123 field5 field6
field1 field2 field3 tag=4.22 field5 field6
field1 field2 field3 tag=5.77 field5 field6
field1 field2 field3 tag=INF field5 field6
field1 field2 field3 tag=INF field5 field6
field1 field2 field3 tag=INF field5 field6
like image 196
James Webster Avatar answered Nov 09 '22 17:11

James Webster


You could add a pre-processing awk step that adds a new field at the end containing the numeric portion or the numeric representation from field 4, and sort by this field. Add a post-processing step to strip this field. Note that in the example below, INF has been set to an arbitrary high value of 10**10, you can set it to a higher value if you have a naturally occurring number in the input that exceeds this value

awk '{x=$4; sub("tag=", "", x); sub("INF", 10**10, x); print $0, x}' file.txt |
sort -k7,7g | 
cut -f-6 -d' '
field1 field2 field3 tag=-1.92 field5 field6
field1 field2 field3 tag=-1.91 field5 field6
field1 field2 field3 tag=0.123 field5 field6
field1 field2 field3 tag=4.22 field5 field6
field1 field2 field3 tag=5.77 field5 field6
field1 field2 field3 tag=INF field5 field6
field1 field2 field3 tag=INF field5 field6
field1 field2 field3 tag=INF field5 field6
like image 44
iruvar Avatar answered Nov 09 '22 15:11

iruvar