I have a CSV file from which I would like to extract some pieces of information: for each distinct value in one colum, I would like to compute the sum of the corresponding values in another column. Eventually, I may do it in Python, but I believe there could be a simple solution using awk
.
This could be the CSV file:
2 1:2010-1-bla:bla 1.6
2 2:2010-1-bla:bla 1.1
2 2:2010-1-bla:bla 3.4
2 3:2010-1-bla:bla -1.3
2 3:2010-1-bla:bla 6.0
2 3:2010-1-bla:bla 1.1
2 4:2010-1-bla:bla -1.0
2 5:2010-1-bla:bla 10.9
I would like to get:
1 1.6
2 4.5
3 5.8
4 -1.0
5 10.9
For now, I can only extract:
a) the values of the first colum:
awk -F ' ' '{print $(2)}' MyFile.csv | awk -F ':' '{print $(1)}'
and then get:
1
2
2
3
3
3
4
5
b) and the values equal to, say, 1.1
in the last column with:
awk -F ' ' '{print $(NF)}' MyFile.csv | awk '$1 == 1.1'
and then get:
1.1
1.1
I am not able to simultaneously extract the columns I am interested in, which may help me in the end. Here is a sample output which may ease the computation of the sums (I don't know):
1 1.6
2 1.1
2 3.4
3 -1.3
3 6.0
3 1.1
4 -1.0
5 10.9
Edit: Thanks to Elenaher, we could say the input is the file above.
$ awk -F"[: \t]+" '{a[$2]+=$NF}END{for(i in a ) print i,a[i] }' file
4 -1
5 10.9
1 1.6
2 4.5
3 5.8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With