Collapse rows with multiple fields

Question

I have this code:

awk '!seen[$1,$2]++{a[$1]=(a[$1] ? a[$1]", " : "	") $2} END{for (i in a) print i a[i]} ' inputfile

and I would like to be working to collapse rows with more than two fields but always base on first field as index.

Input file (three column tab-delimited):

protein_1   membrane    1e-4
protein_1   intracellular   1e-5
protein_2   membrane    1e-50
protein_2   citosol 1e-40

Desired output (three column tab-delimited):

protein_1   membrane, intracellular 1e-4, 1e-5
protein_2   membrane, citosol   1e-50, 1e-40

Thanks!

Stack here:

awk '!seen[$1,$2]++{a[$1]=(a[$1] ? a[$1]"	" : "	") $2};{a[$1]=(a[$1] ? a[$1]", " : "	") $3} END{for (i in a) print i a[i]} ' 1 inputfile

Ed Morton · Accepted Answer

With GNU awk for 2-D arrays:

$ gawk '
{ a[$1][$2] = $3 }
END {
    for (i in a) {
        printf "%s", i
        sep = "	"
        for (j in a[i]) {
            printf "%s%s", sep, j
            sep = ", "
        }
        sep = "	"
        for (j in a[i]) {
            printf "%s%s", sep, a[i][j]
            sep = ", "
        }
        print ""
    }
}' file
protein_1       membrane, intracellular 1e-4, 1e-5
protein_2       membrane, citosol       1e-50, 1e-40

mpapec · Answer

perl -lane'
  $ar = $h{shift @F} ||= [];
  push @{$ar->[$_]}, $F[$_] for 0,1;
  END {
    $" = ", ";
    print "$_	@{$h{$_}[0]}	@{$h{$_}[1]}" for sort keys %h;
  }
' file

output

protein_1 membrane, intracellular 1e-4, 1e-5
protein_2 membrane, citosol 1e-50, 1e-40

Collapse rows with multiple fields

Tags:

awk

perl

biotech

2 Answers

Ed Morton

mpapec

Recent Activity

Donate For Us

Collapse rows with multiple fields

Tags:

awk

perl

biotech

2 Answers

Ed Morton

mpapec

Related questions

Recent Activity

Donate For Us