How to do natural sort on uniq -c
output?
When the counts are <10, the uniq -c | sort
output looks fine:
alvas@ubi:~/testdir$ echo -e "aaa\nbbb\naa\ncd\nada\naaa\nbbb\naa\nccd\naa" > test.txt alvas@ubi:~/testdir$ cat test.txt aaa bbb aa cd ada aaa bbb aa ccd aa alvas@ubi:~/testdir$ cat test.txt | sort | uniq -c | sort 1 ada 1 ccd 1 cd 2 aaa 2 bbb 3 aa
but when the counts are > 10 and even in thousands/hundreds the sort messes up because it's sorting by string and not by natural integer sort:
alvas@ubi:~/testdir$ echo -e "aaa\nbbb\naa\nnaa\nnaa\naa\nnaa\nnaa\nnaa\nnaa\nnaa\nnaa\nnaa\nnaa\nnnaa\ncd\nada\naaa\nbbb\naa\nccd\naa" > test.txt alvas@ubi:~/testdir$ cat test.txt | sort | uniq -c | sort 10 naa 1 ada 1 ccd 1 cd 1 nnaa 2 aaa 2 bbb 4 aa
How to do natural sort output of "uniq -c" in descending/acsending order?
the -r flag is an option of the sort command which sorts the input file in reverse order i.e. descending order by default.
What are sort and uniq? Ordering and manipulating data in Linux-based text files can be carried out using the sort and uniq utilities. The sort command orders a list of items both alphabetically and numerically, whereas the uniq command removes adjacent duplicate lines in a list.
Checking the man page for uniq: Repeated lines in the input will not be detected if they are not adjacent, so it may be necessary to sort the files first. Alternatively, taking the man page suggestion, sorting the list before calling uniq will remove all of the duplicates.
Use -n
in your sort
command, so that it sorts numerically. Also -r
allows you to reverse the result:
$ sort test.txt | uniq -c | sort -n 1 ada 1 ccd 1 cd 1 nnaa 2 aaa 2 bbb 4 aa 10 naa $ sort test.txt | uniq -c | sort -nr 10 naa 4 aa 2 bbb 2 aaa 1 nnaa 1 cd 1 ccd 1 ada
From man sort
:
-n, --numeric-sort
compare according to string numerical value
-r, --reverse
reverse the result of comparisons
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With