Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting unique occurrences in each column

Tags:

awk

I have a file with several columns like $2$3 (until $32) as in

A refdevhet devdevhomo
B refdevhet refdevhet
C refrefhomo refdevhet
D devrefhet  refdevhet

I need to count how many occurrences of each unique element in each column separately

so that I have

refdevhet  2 3
refrefhomo 1 0
devrefhet  1 0
devdevhomo 0 1

I tried several variations of

awk 'BEGIN {
  FS=OFS="\t"
}

{
  for(i=1; i<=32; i++) a[$i]++
}

END {
  for (i in a) print i, a[i]
}' file

but instead it's printing the cumulative sum of occurrences of unique elements across the selected fields.

like image 508
Madza Farias-Virgens Avatar asked Nov 30 '20 19:11

Madza Farias-Virgens


People also ask

How do I count the number of unique values in a column in Excel?

Count Unique Text Values in ExcelEnter the formula =SUM(IF(ISTEXT(range)*COUNTIF(range,range)=1,1,0)) in the destination cell and press Ctrl+Shift+Enter. The range denotes the start and end cells that house the elements. From the general formula, we have added the ISTEXT element to find the unique text values.

Can you count unique values in Excel?

To help you count unique values, Excel offers the UNIQUE function. Array is the range or array to be evaluated. By_col (optional argument) tells Excel whether the data in the array is displayed row-by-row or column-by-column.


1 Answers

Here is a solution:

BEGIN {
    FS=OFS="\t"
}
{
    if (NF>mxf) mxf = NF;
    for(i=1; i<=NF; i++) {ws[$i]=1; c[$i,i]++}
} 
END {
    for (w in ws) {
        printf "%s", w
        for (i=1;i<=mxf;i++) printf "%s%d", OFS, c[w,i];
        print ""
    }
}

Notice that solution is general. It will include first column into consideration as well. To omit the first column, change i=1 to i=2 in both places.

like image 56
Andriy Makukha Avatar answered Oct 07 '22 02:10

Andriy Makukha