Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected COUNTIF behavior when mixing number and text (EXCEL)

This question is about how Excel's COUNTIF function treats different data types when used as an array formula.

There are lots of good posts out there detailing how to use COUNTIF for tasks such as extracting unique values from a list, for example this post. I've managed to use examples from this and other posts to solve specific problems, but I'm trying to get a deeper understanding of array formulas in order to adapt my formulas to new needs.

I came across a peculiar behavior of COUNTIF. In general, Excel seems to treat strings as "larger than" numbers, so that the following examples are valid:

Cell Formula      Returns
=1<2              TRUE
="a"<"b"          TRUE
="a">"b"          FALSE
=1<"b"            TRUE

Now, suppose range A1:A6 contains the following data set:

1
2
3
A
B
C   

For each cell in this set, I want to check how many of all the cells in the set that are smaller than or equal to that cell (a useful technique in more complex formulas). I enter the following array formula in range B1:B6:

{=COUNTIF($A$1:$A$6,"<="&$A$1:$A$6)} (CTRL + SHIFT + ENTER)

Based on the examples above comparing numbers and strings (also illustrated in Column D below), I would expect the output shown below to look like Column C. However, the array formula returns the result shown in Column B, which suggests that strings and number elements are counted separately by arraywise COUNTIF.

Column A     Column B     Column C     Column D
1            1            1            A1<"C" = TRUE
2            2            2            A2<"C" = TRUE
3            3            3            A3<"C" = TRUE
A            1            4            A4<"C" = TRUE
B            2            5            A5<"C" = TRUE
C            3            6            A6<"C" = FALSE

So the question is how to produce the output in Column C? (EDIT: Just to clarify, I'm specifically looking for solutions that make use of COUNTIF's array properties.)

Any insight into why arraywise COUNTIF apparently behaves differently than the single-cell examples would also be much appreciated.

NOTE: I've translated the examples from a non-English version of Excel, so I apologize in advance for any typos.

PS. For a background, I ran into this problem when I tried to build a formula that would both extract unique values from a list with possible duplicates, and sort the unique values in numerical/alphabetical order. My current solution is to do this in two steps. One solution for how to do it in one step is proposed here.

like image 754
Egalth Avatar asked Oct 27 '15 01:10

Egalth


People also ask

Does Countif work with text?

Text Cells can be easily found in Excel using COUNTIF or COUNTIFS functions. The COUNTIF function searches text cells based on specific criteria and in the defined range. As in the example below, the defined range is table Name list, and text criteria is defined using wildcard “*”.

How do I fix Countif errors?

Solution: Open the linked workbook indicated in the formula, and press F9 to refresh the formula. You can also work around this issue by using SUM and IF functions together in an array formula. See SUMIF, COUNTIF and COUNTBLANK functions return #VALUE! error for more information.

Does Countifs work on merged cells?

With the help of LOOKUP, without unmerging, we can use COUNTIF or COUNTIFS in merged cells in Google Sheets.


2 Answers

First of all, excellently laid-out question, and on an interesting topic to boot.

I also raised an eyebrow when I first came across this behaviour of the COUNTIF(S)/SUMIF(S) functions. In their defence, I suppose we could construct situations in which we actually want strings and numerics to be considered separately.

In order to construct your required in-formula array, you will need something like:

MMULT(0+(TRANSPOSE($A$1:$A$6)<=$A$1:$A$6),ROW($A$1:$A$6)^0)

though note that the necessary transposition will mean that any set-up which includes this construction will require committing with CSE.

Regards

like image 118
XOR LX Avatar answered Oct 23 '22 03:10

XOR LX


The different behavior can easily shown if you compare

=COUNTIF($A$1:$A$6,"<=A")

with

{=COUNT(IF($A$1:$A$6<="A",1))}

The first will only get text values from $A$1:$A$6 because it is clearly text to compare and it is faster ignoring other values then. =COUNTIF($A$1:$A$6,"<=3") will only get numeric values from $A$1:$A$6 because of the same reasons. Even if the criterion would be a concatination with a cell reference, then the concatination would be the first process and would lead either to "<=A" or "<=3". So it is ever clear what to compare, text or numbers.

The second first needs an array of the comparisons, then performs the IF, gets so an array of 1 or FALSE and counts then. But the "A" could also be a cell reference. So it is not clear what to compare at the beginning and the first array has to compare all values in $A$1:$A$6.

So COUNTIF(S) and SUMIF(S) cant be used comparing mixed text and numeric data.

The solution is shown already by XOR LX.

Btw.: with your PS. For a background you should consider the following solution from an German Excel site: http://www.excelformeln.de/formeln.html?welcher=236.

In your linked example:

Formula in B2 downwards

{=INDEX($A$2:$A$99,MATCH(LARGE(COUNTIF(A$2:A$99,">="&A$2:A$99)+99*ISNUMBER(A$2:A$99),ROWS($1:1)),COUNTIF(A$2:A$99,">="&A$2:A$99)+99*ISNUMBER(A$2:A$99),0))&""}

In this solution the COUNTIF compares with >= so the biggest text or number will count lowest and so get the lowest position. All number positions are added with 99. So they are ever greater than all possible text positions. So we have a descended sorted array. Then, using LARGE, the list is created from the highest to the lowest position.

like image 21
Axel Richter Avatar answered Oct 23 '22 05:10

Axel Richter