I can't seem find an awk solution for this simple task. I can easily sum a column ($3) based on one matching field ($1) with :
awk -F, '{array[$1]+=$3} END { for (i in array) {print i"," array[i]}}' datas.csv
Now, how can I do that based on two fields ? Lets say $1 and $2 ? Here is a sample datas :
P1,gram,10
P1,tree,12
P1,gram,34
P2,gram,23
...
I simply need to sum column 3 if first and second fields match.
Thanx for any help !
Like so
awk -F, '{array[$1","$2]+=$3} END { for (i in array) {print i"," array[i]}}' datas.csv
My result
P1,tree,12
P1,gram,44
P2,gram,23
EDIT
As the OP needs the commas to remain in the output, I edited the answer above using @yi_H's "comma fix".
For a solution needing less memory, but needing sorting first (nothing is free):
sort datas.csv | awk -F "," 'NR==1{last=$1 "," $2; sum=0;}{if (last != $1 "," $2) {print last "," sum; last=$1 "," $2; sum=0;} sum += $3;}END{print last "," sum;}'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With