Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How To Sort Tab Format File Based on Length of Column K

I have a space delimited tabular file that looks like this:

>NODE 28 length 23 cov 11.043478 ACATCCCGTTACGGTGAGCCGAAAGACCTTATGTATTTTGTGG
>NODE 32 length 21 cov 13.857142 ACAGATGTCATGAAGAGGGCATAGGCGTTATCCTTGACTGG
>NODE 33 length 28 cov 14.035714 TAGGCGTTATCCTTGACTGGGTTCCTGCCCACTTCCCGAAGGACGCAC

How can I use Unix sort to sort it by length of DNA sequence [ATCG]?

like image 673
neversaint Avatar asked Dec 23 '22 01:12

neversaint


2 Answers

If the length is in the 4th column, sort -n -k4 should do the trick.

If the answer needs to figure out the length, then you're looking for a preprocessing step before sort. Perhaps python that just prints out the length of the 7th space separated column as a last or first column.

like image 84
Slartibartfast Avatar answered Dec 27 '22 07:12

Slartibartfast


This pipelined Command will figure out the length also.My Unix is a bit rusty have been doing other things for a while

$ awk '{printf("%d %s\n", length($NF), $0)}' junk.lst|sort -n -k1,1|sed 's/^[0-9]* //'
like image 34
josephj1989 Avatar answered Dec 27 '22 08:12

josephj1989