I have a space delimited tabular file that looks like this:
>NODE 28 length 23 cov 11.043478 ACATCCCGTTACGGTGAGCCGAAAGACCTTATGTATTTTGTGG
>NODE 32 length 21 cov 13.857142 ACAGATGTCATGAAGAGGGCATAGGCGTTATCCTTGACTGG
>NODE 33 length 28 cov 14.035714 TAGGCGTTATCCTTGACTGGGTTCCTGCCCACTTCCCGAAGGACGCAC
How can I use Unix sort
to sort it by length of DNA sequence [ATCG]?
If the length is in the 4th column, sort -n -k4
should do the trick.
If the answer needs to figure out the length, then you're looking for a preprocessing step before sort. Perhaps python that just prints out the length of the 7th space separated column as a last or first column.
This pipelined Command will figure out the length also.My Unix is a bit rusty have been doing other things for a while
$ awk '{printf("%d %s\n", length($NF), $0)}' junk.lst|sort -n -k1,1|sed 's/^[0-9]* //'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With