I have a large file containing data like this:
a 23 b 8 a 22 b 1
I want to be able to get this:
a 45 b 9
I can first sort this file and then do it in Python by scanning the file once. What is a good direct command-line way of doing this?
Using a Backslash. The backslash (\) is an escape character that instructs the shell not to interpret the next character. If the next character is a newline, the shell will read the statement as not having reached its end. This allows a statement to span multiple lines.
A loop is run on the entire list of columns. And each value present in the column is added to the variable x, which in the end of the loop contains the sum of all the numbers in the line. Using while loop, every column can be read into a variable. And the varaibles are summed up using the $(()) notation.
Edit: The modern (GNU/Linux) solution, as mentioned in comments years ago ;-) .
awk '{ arr[$1]+=$2 } END { for (key in arr) printf("%s\t%s\n", key, arr[key]) }' file \ | sort -k1,1
The originally posted solution, based on old Unix sort
options:
awk '{ arr[$1]+=$2 } END { for (key in arr) printf("%s\t%s\n", key, arr[key]) }' file \ | sort +0n -1
I hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With