I have a file with several columns like <code>$2$3</code> (until <code>$32</code>) as in <pre class="prettyprint"><code>A refdevhet devdevhomo B refdevhet refdevhet C refrefhomo refdevhet D devrefhet refdevhet </code></pre> I need to count how many occurrences of each unique element in each column separately so that I have <pre class="prettyprint"><code>refdevhet 2 3 refrefhomo 1 0 devrefhet 1 0 devdevhomo 0 1 </code></pre> I tried several variations of <pre class="prettyprint lang-sh prettyprint-override"><code>awk 'BEGIN { FS=OFS="\t" } { for(i=1; i<=32; i++) a[$i]++ } END { for (i in a) print i, a[i] }' file </code></pre> but instead it's printing the cumulative sum of occurrences of unique elements across the selected fields.

Here is a solution: <pre class="prettyprint"><code>BEGIN { FS=OFS="\t" } { if (NF>mxf) mxf = NF; for(i=1; i<=NF; i++) {ws[$i]=1; c[$i,i]++} } END { for (w in ws) { printf "%s", w for (i=1;i<=mxf;i++) printf "%s%d", OFS, c[w,i]; print "" } } </code></pre> Notice that solution is general. It will include first column into consideration as well. To omit the first column, change <code>i=1</code> to <code>i=2</code> in both places.

Counting unique occurrences in each column

Tags:

awk

I have a file with several columns like $2$3 (until $32) as in

A refdevhet devdevhomo
B refdevhet refdevhet
C refrefhomo refdevhet
D devrefhet  refdevhet

I need to count how many occurrences of each unique element in each column separately

so that I have

refdevhet  2 3
refrefhomo 1 0
devrefhet  1 0
devdevhomo 0 1

I tried several variations of

awk 'BEGIN {
  FS=OFS="\t"
}

{
  for(i=1; i<=32; i++) a[$i]++
}

END {
  for (i in a) print i, a[i]
}' file

but instead it's printing the cumulative sum of occurrences of unique elements across the selected fields.

508

asked Nov 30 '20 19:11

Madza Farias-Virgens

1 Answers

Here is a solution:

BEGIN {
    FS=OFS="\t"
}
{
    if (NF>mxf) mxf = NF;
    for(i=1; i<=NF; i++) {ws[$i]=1; c[$i,i]++}
} 
END {
    for (w in ws) {
        printf "%s", w
        for (i=1;i<=mxf;i++) printf "%s%d", OFS, c[w,i];
        print ""
    }
}

Notice that solution is general. It will include first column into consideration as well. To omit the first column, change i=1 to i=2 in both places.

answered Oct 07 '22 02:10

Andriy Makukha

Related questions
                            
                                how to pass in a variable to awk commandline
                            
                                extracting specific lines from a text file
                            
                                Can I grep for multiple patterns but have some be inverse? [duplicate]
                            
                                Calculate median of a sliding window with awk
                            
                                Is Awk and multiple file processing possible?
                            
                                How to insert a line in a file between two blocks of known lines (if not already inserted previously), using bash?
                            
                                Replacing specific characters in first column of text
                            
                                awk print vs printf functions
                            
                                Command to replace specific column of csv file for first 100 rows
                            
                                Convert exponentials and rounding numbers in BASH
                            
                                Move column to last in awk
                            
                                Word Count using AWK
                            
                                Regex replace on specific column with SED/AWK
                            
                                How can I skip line with awk
                            
                                Average of multiple files in shell
                            
                                Remove duplicate lines and overwrite file in same command
                            
                                Filter file with awk and keep header in output
                            
                                Extract email addresses from log with grep or sed
                            
                                Sum durations in bash
                            
                                Remove redundant strings without looping

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With