I have a text file like this example:
example:
chr12 58146000 58146050 79 chr12 58145961 58146075 CDK4
chr12 58146050 58146075 81 chr12 58145961 58146075 CDK4
chr12 69082750 69082800 57 chr12 69082741 69082833 NUP107
chr12 99038450 99038479 81 chr12 99038300 99038479 IKBIP
chr12 104680862 104680887 512 chr12 104680862 104680887 TXNRD1
chr12 104682708 104682750 134 chr12 104682708 104682818 TXNRD1
I want to group them based on column 8 and sum the values of column 4 which belong the same group and the results would be a tab separated file with 2 columns. the first column is the numbers that are made after summation(from 4th column) and the 2nd column is the group name (from 8th column). I tried the following code but it does not return what I want. do you know how to fix it?
cut -d'\t' -f 8 | sort | uniq -c | awk '{ print sum($4), $8 }' infile > outfile
here is the expected output:
expected output:
160 CDK4
57 NUP107
81 IKBIP
646 TXNRD1
$ awk -v OFS='\t' '{sum[$8]+=$4} END{for (grp in sum) print sum[grp], grp}' file
81 IKBIP
57 NUP107
646 TXNRD1
160 CDK4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With