Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting Numerically with awk (gawk)

Tags:

awk

gawk

In an attempt to solve a question, I wrote the following gnu-awk script and ran into an issue with sort (should have read the manual first).

From the manual:

Because IGNORECASE affects string comparisons, the value of IGNORECASE also affects sorting for both asort() and asorti(). Note also that the locale's sorting order does not come into play; comparisons are based on character values only.

This was the proposed solution:

awk '{
    lines[$0]=length($0)
}
END {
    for(line in lines) { tmp[lines[line],line] = line }
    n = asorti(tmp)
    for(i=1; i<=n; i++) {
        split(tmp[i], tmp2, SUBSEP); 
        ind[++j] = tmp2[2]
    }
    for(i=n; i>0; i--)
        print ind[i],lines[ind[i]]
}' file
aaaaa foo 9
aaa foooo 9
aaaa foo 8
aaa foo 7
as foo 6
a foo 5
aaaaaaa foooo 13

I tried adding 0 to force numeric type, however wasn't able to reach the desired output. Is there a way we can simulate numeric sort in awk/gawk?

Input File:

aaa foooo
aaaaaaa foooo
a foo
aaa foo
aaaaa foo
as foo
aaaa foo

Desired Output:

aaaaaaa foooo
aaaaa foo     # Doesnt matter which one comes first (since both are same size)
aaa foooo     # Doesnt matter which one comes first (since both are same size)
aaaa foo
aaa foo
as foo
a foo

The numbers shows in the script output is just for illustration on how sorting was done.

like image 341
jaypal singh Avatar asked Nov 28 '22 01:11

jaypal singh


2 Answers

see this example, Jaypal, you will get:

kent$  cat f
3333333
50
100
25
44

kent$  awk '{a[$0]}END{asorti(a,b);for(i=1;i<=NR;i++)print b[i]}' f          
100
25
3333333
44
50

kent$  awk '{a[$0]}END{asorti(a,b,"@val_num_asc");for(i=1;i<=NR;i++)print b[i]}' f
25
44
50
100
3333333
like image 122
Kent Avatar answered Dec 26 '22 12:12

Kent


The problem you're having is that you're calling asorti() which sorts on array indices and by definition all awk array indices are strings and therefore the sorting is string-based. You can pad with some number of leading zeros using str=sprintf("%20s",num); gsub(/ /,0,str) for example so every string is the same length (e.g. 001, 010 and 100 instead of 1, 10, 100) or use and sort on array elements via asort() instead of indices using asorti() since array elements can be either strings or numbers.

like image 29
Ed Morton Avatar answered Dec 26 '22 12:12

Ed Morton