Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GROUP BY/SUM from shell

I have a large file containing data like this:

a 23 b 8 a 22 b 1 

I want to be able to get this:

a 45 b 9 

I can first sort this file and then do it in Python by scanning the file once. What is a good direct command-line way of doing this?

like image 692
Legend Avatar asked Apr 23 '12 18:04

Legend


People also ask

How do I split a shell into multiple lines?

Using a Backslash. The backslash (\) is an escape character that instructs the shell not to interpret the next character. If the next character is a newline, the shell will read the statement as not having reached its end. This allows a statement to span multiple lines.

How do I sum a column in Unix?

A loop is run on the entire list of columns. And each value present in the column is added to the variable x, which in the end of the loop contains the sum of all the numbers in the line. Using while loop, every column can be read into a variable. And the varaibles are summed up using the $(()) notation.


1 Answers

Edit: The modern (GNU/Linux) solution, as mentioned in comments years ago ;-) .

awk '{     arr[$1]+=$2    }    END {      for (key in arr) printf("%s\t%s\n", key, arr[key])    }' file \    | sort -k1,1 

The originally posted solution, based on old Unix sort options:

awk '{     arr[$1]+=$2    }    END {      for (key in arr) printf("%s\t%s\n", key, arr[key])    }' file \    | sort +0n -1 

I hope this helps.

like image 197
shellter Avatar answered Oct 07 '22 16:10

shellter