Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count TRUE values in a logical vector

Tags:

r

In R, what is the most efficient/idiomatic way to count the number of TRUE values in a logical vector? I can think of two ways:

z <- sample(c(TRUE, FALSE), 1000, rep = TRUE) sum(z) # [1] 498  table(z)["TRUE"] # TRUE  #  498  

Which do you prefer? Is there anything even better?

like image 932
Jyotirmoy Bhattacharya Avatar asked Feb 03 '10 09:02

Jyotirmoy Bhattacharya


People also ask

How do you determine the number of true values?

To count the number of TRUE entries, which here is 5, the formula =COUNTIF(A1:A6,TRUE) applied to the column should work, but it always returns the result 1. On the other hand, the formula =COUNTIF(A1:A6,FALSE) works correctly on a similar column got by pulling down FALSE. COUNTIF() works properly.

How do I count the number of true rows in R?

Count the number of TRUEs (i.e., missing values) per row You can use the rowSums() function to do this. As the name suggests, this function sums the values of all elements in a row. Since TRUEs are equal to 1 and FALSEs are equal to 0, summing the number of TRUEs is the same as counting the number of NA's.

Which of the following commands would you use to count the total number of true elements in a vector?

Hi, You can get a count of all values in a vector using table().


Video Answer


2 Answers

The safest way is to use sum with na.rm = TRUE:

sum(z, na.rm = TRUE) # best way to count TRUE values 

which gives 1.

There are some problems with other solutions when logical vector contains NA values.

See for example:

z <- c(TRUE, FALSE, NA)  sum(z) # gives you NA table(z)["TRUE"] # gives you 1 length(z[z == TRUE]) # f3lix answer, gives you 2 (because NA indexing returns values) 

Additionally table solution is less efficient (look at the code of table function).

Also, you should be careful with the "table" solution, in case there are no TRUE values in the logical vector. See for example:

z <- c(FALSE, FALSE) table(z)["TRUE"] # gives you `NA` 

or

z <- c(NA, FALSE) table(z)["TRUE"] # gives you `NA` 
like image 106
Marek Avatar answered Oct 01 '22 06:10

Marek


Another option which hasn't been mentioned is to use which:

length(which(z)) 

Just to actually provide some context on the "which is faster question", it's always easiest just to test yourself. I made the vector much larger for comparison:

z <- sample(c(TRUE,FALSE),1000000,rep=TRUE) system.time(sum(z))    user  system elapsed     0.03    0.00    0.03 system.time(length(z[z==TRUE]))    user  system elapsed     0.75    0.07    0.83  system.time(length(which(z)))    user  system elapsed     1.34    0.28    1.64  system.time(table(z)["TRUE"])    user  system elapsed    10.62    0.52   11.19  

So clearly using sum is the best approach in this case. You may also want to check for NA values as Marek suggested.

Just to add a note regarding NA values and the which function:

> which(c(T, F, NA, NULL, T, F)) [1] 1 4 > which(!c(T, F, NA, NULL, T, F)) [1] 2 5 

Note that which only checks for logical TRUE, so it essentially ignores non-logical values.

like image 42
Shane Avatar answered Oct 01 '22 07:10

Shane