I am processing output from a file in bash and need to group values by their keys.
For example, I have the
13,47099
13,54024
13,1
13,39956
13,0
17,126223
17,52782
17,4
17,62617
17,0
23,1022724
23,79958
23,80590
23,230
23,1
23,118224
23,0
23,1049
42,72470
42,80185
42,2
42,89199
42,0
54,70344
54,72824
54,1
54,62969
54,1
in a file and group all values from a particular key into a single line as in
13,47099,54024,1,39956,0
17,126223,52782,4,62617,0
23,1022724,79958,80590,230,1,118224,0,1049
42,72470,80185,2,89199,0
54,70344,72824,1,62969,1
There are about 10000 entries in my input file. How do I transform this data in shell ?
awk to the rescue!
assuming keys are contiguous...
$ awk -F, 'p!=$1 {if(a) print a; a=p=$1}
{a=a FS $2}
END {print a}' file
13,47099,54024,1,39956,0
17,126223,52782,4,62617,0
23,1022724,79958,80590,230,1,118224,0,1049
42,72470,80185,2,89199,0
54,70344,72824,1,62969,1
Here is a breakdown of what @karakfa's code is doing, for us awk beginners. I've written this based on a toy dataset file:
1,X
1,Y
3,Z
p!=$1: check if the pattern p!=$1 is true
p is equal to the first field of the current (first) line of file (1 in this case)p is undefined at this point it cannot be equal to 1, so p!=$1 is true and we continue with this line of codeif(a) print a: check if variable a exists and print a if it does exists
a is undefined at this point the print a command is not executeda=p=$1: set variables a and p equal to the value of the first field of the current (first) line (1 in this case)a=a FS $2: set variable a equal to a combined with the value of the second field of the current (first) line separated by the field separator (1,X in this case)END: since we haven't reached the end of file yet, we skip the the rest of this line of codemove to the next (second) line of file and restart the awk code on that line
p!=$1: check if the pattern p!=$1 is true
p is 1 and the first field of the current (second) line is 1, p!=$1 is false and we skip the the rest of this line of codea=a FS $2: set a equal to the value of a and the value of the second field of the current (second) line separated by the filed separator (1,X,Y in this case)END: since we haven't reached the end of file yet, we skip the the rest of this line of codemove to the next (third) line of file and restart the awk code
p!=$1: check if the pattern p!=$1 is true
p is 1 and $1 of the third line is 3, p!=$1 is true and we continue with this line of codeif(a) print a: check if variable a exists and print a if it does exists
a is 1,X,Y at this point, 1,X,Y is printed to the outputa=p=$1: set variables a and p equal to the value of the first field of the current (third) line (3 in this case)a=a FS $2: set variable a equal to a combined with the value of the second field of the current (third) line separated by the field separator (3,Z in this case)END {print a}: since we have reached the end of file, execute this code
print a: print the last group a (3,Z in this case)The resulting output is
1,X,Y
3,Z
Please let me know if there are any errors in this description.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With