Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count unique values per category in Excel

Tags:

excel

I have looked around on this site and online for an answer to this question, but I have not been sucessful yet so I will ask in a new thread.

I need to be able to count the number of ocurrences of a particular text per category in Excel using a formula (or a number of formulas). Preferably this can be done without the use of VBA, but if it is nessesary I will attempt to write a VBA command.

Here is what I am working with:

Cat.    Word
A       doo
A       foo
A       foo
A       loo
B       loo
B       goo
B       loo
A       roo
A       lol
B       rar
C       goo
...      ...

I would like to have a formula that counts the number of unique occurrences of each word per category. The categories are known and finite, and are included in a list within my excel sheet, but the words can be any amount and are not known. Each category can have words that other categories would have as well the end result should look something like this:

Cat.    Count of distinct ocurrences per category
A       5
B       3
C       1

Finally, I know that I can deduplicate the list, but I would like to keep the list intact for archival purposes, so a count that does not require deduping would be ideal.

Here is what I tried to do, but did not work in the end:

=IF(A:A=C1,SUM(IF(FREQUENCY(MATCH(B1:B11,B1:B11,0),MATCH(B1:B11,B1:B11,0))>0,1)))

Thank you so much for your help!

like image 774
Luchadeer Avatar asked Oct 08 '12 18:10

Luchadeer


2 Answers

Try this "array formula"

=SUM(IF(FREQUENCY(IF(A$1:A$11=C1,IF(B$1:B$11<>"",MATCH(B$1:B$11,B$1:B$11,0))),ROW(B$1:B$11)-ROW(B$1)+1),1))

confirmed with CTRL+SHIFT+ENTER

Where C1 is the first category to count for. Copy formula down for all categories

Explanation

The basic idea of the formula is that the first IF is a "filter" to ensure that you only look at B values if A value = C1, the second IF filters out blanks in B, otherwise the MATCH gives the position of the first match for that value in B (thus giving the same number for the same values)

The result of the IFs is the first part of FREQUENCY function. The "bins" in FREQUENCY are formed by the ROW-ROW+1 part which gives you a range of numbers 1 to n which will encompass the whole range of values that MATCH might give.

Now we just need to count the number of different numbers returned by the IFs (as that will be equal to the number of different values in B for your criterion), so the final 1 in the formula is assigned to each non-zero value in FREQUENCY and the result is summed.

In detail for your example where C1 = A this part

=IF(A$1:A$11=C1,IF(B$1:B$11<>"",MATCH(B$1:B$11,B$1:B$11,0)))

results in this array

{1;2;2;4;FALSE;FALSE;FALSE;8;9;FALSE;FALSE}

FALSE values where col A <> "A" otherwise numbers from MATCH giving the position of the first match for each value - notice 2 appears twice because foo appears twice, firstly in row 2

and the ROW(B$1:B$11)-ROW(B$1)+1 part gives you this array

{1;2;3;4;5;6;7;8;9;10;11}

so now the formula looks like this

=SUM(IF(FREQUENCY({1;2;2;4;FALSE;FALSE;FALSE;8;9;FALSE;FALSE},{1;2;3;4;5;6;7;8;9;10;11}),1))

So FREQUENCY puts the values in the first array into the relevant "bins" and formula now resolves to this

=SUM(IF({1;2;0;1;0;0;0;1;1;0;0;0},1))

Note that the array returned by FREQUENCY sums to 6 (six rows with value "A") but there are only 5 non-zero numbers, denoting the five different values for criterion "A".

Now the last IF returns a 1 for each non-zero value and SUM function sums those to get 5

like image 126
barry houdini Avatar answered Sep 27 '22 19:09

barry houdini


Personally, I prefer to do that using Pivot Tables. For anything larger than an occasional spreadsheet, SQL queries in Microsoft Access or other databases win the game.

like image 41
Deer Hunter Avatar answered Sep 27 '22 19:09

Deer Hunter