Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sum column based on two matching fields using awk

Tags:

awk

I can't seem find an awk solution for this simple task. I can easily sum a column ($3) based on one matching field ($1) with :

awk -F, '{array[$1]+=$3} END { for (i in array) {print i"," array[i]}}' datas.csv

Now, how can I do that based on two fields ? Lets say $1 and $2 ? Here is a sample datas :

P1,gram,10  
P1,tree,12  
P1,gram,34  
P2,gram,23  
...

I simply need to sum column 3 if first and second fields match.

Thanx for any help !

like image 996
Chargaff Avatar asked Aug 07 '11 01:08

Chargaff


2 Answers

Like so

awk -F, '{array[$1","$2]+=$3} END { for (i in array) {print i"," array[i]}}' datas.csv

My result

P1,tree,12
P1,gram,44
P2,gram,23

EDIT

As the OP needs the commas to remain in the output, I edited the answer above using @yi_H's "comma fix".

like image 186
Ray Toal Avatar answered Oct 11 '22 09:10

Ray Toal


For a solution needing less memory, but needing sorting first (nothing is free):

sort datas.csv | awk -F "," 'NR==1{last=$1 "," $2; sum=0;}{if (last != $1 "," $2) {print last "," sum; last=$1 "," $2; sum=0;} sum += $3;}END{print last "," sum;}'
like image 27
jfg956 Avatar answered Oct 11 '22 08:10

jfg956