I have looked around on this site and online for an answer to this question, but I have not been sucessful yet so I will ask in a new thread.
I need to be able to count the number of ocurrences of a particular text per category in Excel using a formula (or a number of formulas). Preferably this can be done without the use of VBA, but if it is nessesary I will attempt to write a VBA command.
Here is what I am working with:
Cat. Word
A doo
A foo
A foo
A loo
B loo
B goo
B loo
A roo
A lol
B rar
C goo
... ...
I would like to have a formula that counts the number of unique occurrences of each word per category. The categories are known and finite, and are included in a list within my excel sheet, but the words can be any amount and are not known. Each category can have words that other categories would have as well the end result should look something like this:
Cat. Count of distinct ocurrences per category
A 5
B 3
C 1
Finally, I know that I can deduplicate the list, but I would like to keep the list intact for archival purposes, so a count that does not require deduping would be ideal.
Here is what I tried to do, but did not work in the end:
=IF(A:A=C1,SUM(IF(FREQUENCY(MATCH(B1:B11,B1:B11,0),MATCH(B1:B11,B1:B11,0))>0,1)))
Thank you so much for your help!
Try this "array formula"
=SUM(IF(FREQUENCY(IF(A$1:A$11=C1,IF(B$1:B$11<>"",MATCH(B$1:B$11,B$1:B$11,0))),ROW(B$1:B$11)-ROW(B$1)+1),1))
confirmed with CTRL+SHIFT+ENTER
Where C1 is the first category to count for. Copy formula down for all categories
Explanation
The basic idea of the formula is that the first IF
is a "filter" to ensure that you only look at B values if A value = C1, the second IF
filters out blanks in B, otherwise the MATCH
gives the position of the first match for that value in B (thus giving the same number for the same values)
The result of the IFs
is the first part of FREQUENCY
function. The "bins" in FREQUENCY
are formed by the ROW-ROW+1 part which gives you a range of numbers 1 to n which will encompass the whole range of values that MATCH
might give.
Now we just need to count the number of different numbers returned by the IF
s (as that will be equal to the number of different values in B for your criterion), so the final 1 in the formula is assigned to each non-zero value in FREQUENCY
and the result is summed.
In detail for your example where C1 = A this part
=IF(A$1:A$11=C1,IF(B$1:B$11<>"",MATCH(B$1:B$11,B$1:B$11,0)))
results in this array
{1;2;2;4;FALSE;FALSE;FALSE;8;9;FALSE;FALSE}
FALSE values where col A <> "A" otherwise numbers from MATCH
giving the position of the first match for each value - notice 2 appears twice because foo appears twice, firstly in row 2
and the ROW(B$1:B$11)-ROW(B$1)+1
part gives you this array
{1;2;3;4;5;6;7;8;9;10;11}
so now the formula looks like this
=SUM(IF(FREQUENCY({1;2;2;4;FALSE;FALSE;FALSE;8;9;FALSE;FALSE},{1;2;3;4;5;6;7;8;9;10;11}),1))
So FREQUENCY
puts the values in the first array into the relevant "bins" and formula now resolves to this
=SUM(IF({1;2;0;1;0;0;0;1;1;0;0;0},1))
Note that the array returned by FREQUENCY
sums to 6 (six rows with value "A") but there are only 5 non-zero numbers, denoting the five different values for criterion "A".
Now the last IF returns a 1 for each non-zero value and SUM
function sums those to get 5
Personally, I prefer to do that using Pivot Tables. For anything larger than an occasional spreadsheet, SQL queries in Microsoft Access or other databases win the game.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With