I have this code:
awk '!seen[$1,$2]++{a[$1]=(a[$1] ? a[$1]", " : "\t") $2} END{for (i in a) print i a[i]} ' inputfile
and I would like to be working to collapse rows with more than two fields but always base on first field as index.
Input file (three column tab-delimited):
protein_1 membrane 1e-4
protein_1 intracellular 1e-5
protein_2 membrane 1e-50
protein_2 citosol 1e-40
Desired output (three column tab-delimited):
protein_1 membrane, intracellular 1e-4, 1e-5
protein_2 membrane, citosol 1e-50, 1e-40
Thanks!
Stack here:
awk '!seen[$1,$2]++{a[$1]=(a[$1] ? a[$1]"\t" : "\t") $2};{a[$1]=(a[$1] ? a[$1]", " : "\t") $3} END{for (i in a) print i a[i]} ' 1 inputfile
With GNU awk for 2-D arrays:
$ gawk '
{ a[$1][$2] = $3 }
END {
for (i in a) {
printf "%s", i
sep = "\t"
for (j in a[i]) {
printf "%s%s", sep, j
sep = ", "
}
sep = "\t"
for (j in a[i]) {
printf "%s%s", sep, a[i][j]
sep = ", "
}
print ""
}
}' file
protein_1 membrane, intracellular 1e-4, 1e-5
protein_2 membrane, citosol 1e-50, 1e-40
perl -lane'
$ar = $h{shift @F} ||= [];
push @{$ar->[$_]}, $F[$_] for 0,1;
END {
$" = ", ";
print "$_\t@{$h{$_}[0]}\t@{$h{$_}[1]}" for sort keys %h;
}
' file
output
protein_1 membrane, intracellular 1e-4, 1e-5
protein_2 membrane, citosol 1e-50, 1e-40
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With