I have this input file (1=active, 0=inactive)
a  1
a  0                    
b  1                      
b  1
b  0
c  0 
c  0
c  0
c  0
.
.
.
And want output like this:
 X       repeats            active count    inactive count
 a       2 times                 1               1 
 b       3 times                 2               1 
 c       4 times                 0               4 
I tried:
awk -F "," '{if ($2==1) a[$1]++; } END { for (i in a); print i, a[i] }'file name
But that did not work.
How can I get the output?
Just to give you an idea this awk should work:
awk '$2{a[$1]++; next} {b[$1]++; if (!($1 in a)) a[$1]=0} END{for (i in a) print i, a[i], b[i], (a[i]+b[i])}' file
a 1 1 2
b 2 1 3
c 0 4 4
You can format the output way you want.
You can try
awk -f r.awk input.txt
where input.awk is your data file, and r.awk is
{
    X[$1]++
    if ($2) a[$1]++
    else ia[$1]++
}
END {
    printf "X\tRepeat\tActive\tInactive\n"
    for (i in X) {
        printf "%s\t%d\t%d\t%d\n", i, X[i], a[i], ia[i]
    }
}
                        awk '{a[$1]++; if ($2!=0) {b[$1]++;c[$1]+=0} else {c[$1]++;b[$1]+=0}}END {for (i in a) print i, a[i], b[i], c[i]}' file
                        Here is another simple way to do it with awk
awk '{a[$1]++;b[$1]+=$2} END { for (i in a) print i,a[i],b[i],a[i]-b[i]}' file
a 2 1 1
b 3 2 1
c 4 0 4
No test is needed, just sum the column $2 and this gives number of hits.
awk '
{ repeats[$1]++; counts[$1,$2]++ }
END {
    for (key in repeats)
        print key, repeats[key], counts[key,1]+0, counts[key,0]+0
}
' file
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With