Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grouping and summarizing a text file in awk

Tags:

awk

I have a text file like this example:

example:

chr12   58146000    58146050    79  chr12   58145961    58146075    CDK4
chr12   58146050    58146075    81  chr12   58145961    58146075    CDK4
chr12   69082750    69082800    57  chr12   69082741    69082833    NUP107
chr12   99038450    99038479    81  chr12   99038300    99038479    IKBIP
chr12   104680862   104680887   512 chr12   104680862   104680887   TXNRD1
chr12   104682708   104682750   134 chr12   104682708   104682818   TXNRD1

I want to group them based on column 8 and sum the values of column 4 which belong the same group and the results would be a tab separated file with 2 columns. the first column is the numbers that are made after summation(from 4th column) and the 2nd column is the group name (from 8th column). I tried the following code but it does not return what I want. do you know how to fix it?

cut -d'\t' -f 8 | sort | uniq -c | awk '{ print sum($4), $8 }' infile > outfile

here is the expected output:

expected output:

160 CDK4
57  NUP107
81  IKBIP
646 TXNRD1
like image 793
elly Avatar asked Dec 29 '25 05:12

elly


1 Answers

$ awk -v OFS='\t' '{sum[$8]+=$4} END{for (grp in sum) print sum[grp], grp}' file
81      IKBIP
57      NUP107
646     TXNRD1
160     CDK4
like image 148
Ed Morton Avatar answered Jan 01 '26 04:01

Ed Morton



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!