I have a text file with 10 columns say f.txt which looks like below:
aab abb 263-455
aab abb 263-455
aab abb 263-455
bbb abb 26-455
bbb abb 26-455
bbb aka 264-266
bga bga 230-232
bga bga 230-232
I want to count the unique number of each string in the first and second columns based on the numbers of third column.
Output:
aab - 1
abb - 2
bbb - 2
aka - 1
bga - 2
Total no - 8
awk '
!s[1":"$1":"$3]++{sU[$1]++;tot++}
!s[2":"$2":"$3]++{sU[$2]++;tot++}
END{
for (x in sU) print x, sU[x];
print "Total No -",tot;
}' input
Output
bga 1
aab 1
bbb 2
aka 1
bga 1
abb 2
Total No - 8
This will do the trick:
$ awk '!a[$0]++{c[$1]++;c[$2]++}
END{for(k in c){print k" - "c[k];s+=c[k]}print "\nTotal No -",s}' file
aka - 1
bga - 2
aab - 1
abb - 2
bbb - 2
Total No - 8
In the more readable script form:
!lines[$0]++{
count[$1]++
count[$2]++
}
END {
for (line in count) {
print line" - "count[line]
sum += count[line]
}
print "\nTotal No -",sum
}
To run it in this form save it to a file script.awk
and:
$ awk -f script.awk file
aka - 1
bga - 2
aab - 1
abb - 2
bbb - 2
Total No - 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With