Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collapse rows with multiple fields

Tags:

awk

perl

I have this code:

awk '!seen[$1,$2]++{a[$1]=(a[$1] ? a[$1]", " : "\t") $2} END{for (i in a) print i a[i]} ' inputfile

and I would like to be working to collapse rows with more than two fields but always base on first field as index.

Input file (three column tab-delimited):

protein_1   membrane    1e-4
protein_1   intracellular   1e-5
protein_2   membrane    1e-50
protein_2   citosol 1e-40

Desired output (three column tab-delimited):

protein_1   membrane, intracellular 1e-4, 1e-5
protein_2   membrane, citosol   1e-50, 1e-40

Thanks!

Stack here:

awk '!seen[$1,$2]++{a[$1]=(a[$1] ? a[$1]"\t" : "\t") $2};{a[$1]=(a[$1] ? a[$1]", " : "\t") $3} END{for (i in a) print i a[i]} ' 1 inputfile
like image 518
biotech Avatar asked Dec 20 '22 17:12

biotech


2 Answers

With GNU awk for 2-D arrays:

$ gawk '
{ a[$1][$2] = $3 }
END {
    for (i in a) {
        printf "%s", i
        sep = "\t"
        for (j in a[i]) {
            printf "%s%s", sep, j
            sep = ", "
        }
        sep = "\t"
        for (j in a[i]) {
            printf "%s%s", sep, a[i][j]
            sep = ", "
        }
        print ""
    }
}' file
protein_1       membrane, intracellular 1e-4, 1e-5
protein_2       membrane, citosol       1e-50, 1e-40
like image 69
Ed Morton Avatar answered Dec 22 '22 05:12

Ed Morton


perl -lane'
  $ar = $h{shift @F} ||= [];
  push @{$ar->[$_]}, $F[$_] for 0,1;
  END {
    $" = ", ";
    print "$_\t@{$h{$_}[0]}\t@{$h{$_}[1]}" for sort keys %h;
  }
' file

output

protein_1 membrane, intracellular 1e-4, 1e-5
protein_2 membrane, citosol 1e-50, 1e-40
like image 21
mpapec Avatar answered Dec 22 '22 07:12

mpapec