Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use a Unix sort command to sort by human-readable numeric file size in a column?

This question now answered - scroll to the end of this post for the solution.

Apologies if the answer is already here, but all the answers I have found so far suggest either the -h flag or the -n flag, and neither of those are working for me...

I have some output from a curl command that is giving me several columns of data. One of those columns is a human-readable file size ("1.6mb", "4.3gb" etc).

I am using the unix sort command to sort by the relevant column, but it appears to be trying to sort alphabetically instead of numercially. I have tried using both the -n and the -h flags, but although they do change the order, in neither case is the order numerically correct.

I am on CentOS Linux box, version 7.2.1511. The version of sort I have is "sort (GNU coreutils) 8.22".

I have tried using the -h flag in these different formats:

curl localhost:9200/_cat/indices | sort -k9,9h | head -n5
curl localhost:9200/_cat/indices | sort -k9 -h | head -n5
curl localhost:9200/_cat/indices | sort -k 9 -h | head -n5
curl localhost:9200/_cat/indices | sort -k9h | head -n5

I always get these results:

green open indexA            5 1        0       0   1.5kb    800b
green open indexB            5 1  9823178 2268791 152.9gb  76.4gb
green open indexC            5 1    35998    7106 364.9mb 182.4mb
green open indexD            5 1      108      11 387.1kb 193.5kb
green open indexE            5 1        0       0   1.5kb    800b

I have tried using the -n flag in the same formats as above:

curl localhost:9200/_cat/indices | sort -k9,9n | head -n5
curl localhost:9200/_cat/indices | sort -k9 -n | head -n5
curl localhost:9200/_cat/indices | sort -k 9 -n | head -n5
curl localhost:9200/_cat/indices | sort -k9n | head -n5

I always get these results:

green open index1      5 1     1021       0   3.2mb   1.6mb
green open index2      5 1     8833       0   4.1mb     2mb
green open index3      5 1     4500       0     5mb   2.5mb
green open index4      1 0        3       0   3.9kb   3.9kb
green open index5      3 1  2516794       0   8.6gb   4.3gb

Edit: It turned out there were two problems:

1) sort expects to see capital single letters - M, K and G instead of mb, kb and gb (for bytes you can just leave blank).

2) sort will include leading spaces unless you explicitly exclude them, which messes with the ordering.

The solution is to replace lower case with upper case and use the -b flag to make sort ignore leading spaces (I've based this answer on @Vinicius' solution below, because it's easier to read if you don't know regex):

curl localhost:9200/_cat/indices | tr '[kmg]b' '[KMG] ' | sort -k9hb
like image 292
ClareSudbery Avatar asked Oct 19 '25 16:10

ClareSudbery


1 Answers

Your 'm' and 'g' units should be uppercase. GNU sort manual reads:

-h --human-numeric-sort --sort=human-numeric

Sort numerically, first by numeric sign (negative, zero, or positive); then by SI suffix (either empty, or ‘k’ or ‘K’, or one of ‘MGTPEZY’, in that order; see Block size); and finally by numeric value.

You can change the output of curl with GNU sed like this:

curl localhost:9200/_cat/indices \
| sed 's/[0-9][mgtpezy]/\U&/g'
| sort -k9,9h \
| head -n5

Yields:

green open index4      1 0        3       0   3.9kb   3.9kb
green open index1      5 1     1021       0   3.2Mb   1.6Mb
green open index2      5 1     8833       0   4.1Mb     2Mb
green open index3      5 1     4500       0     5Mb   2.5Mb
green open index5      3 1  2516794       0   8.6Gb   4.3Gb

Other letters like "b" will be treated as "no unit":

green open indexA            5 1        0       0   1.5kb    800b
green open indexE            5 1        0       0   1.5kb    800b
green open indexD            5 1      108      11 387.1kb 193.5kb
green open indexC            5 1    35998    7106 364.9Mb 182.4Mb
green open indexB            5 1  9823178 2268791 152.9Gb  76.4Gb

If so desired, you can change the units in the sorted output back to lowercase by piping to sed 's/[0-9][MGTPEZY]/\L&/g'

like image 105
xhienne Avatar answered Oct 22 '25 07:10

xhienne



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!