Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to find if all elements in a subset of a data.frame row are TRUE

Tags:

r

I have a data.frame with a block of columns that are logicals, e.g.

> tmp <- data.frame(a=c(13, 23, 52),
+                   b=c(TRUE,FALSE,TRUE),
+                   c=c(TRUE,TRUE,FALSE),
+                   d=c(TRUE,TRUE,TRUE))
> tmp
   a     b     c    d
1 13  TRUE  TRUE TRUE
2 23 FALSE  TRUE TRUE
3 52  TRUE FALSE TRUE

I'd like to compute a summary column (say: e) that is a logical AND over the whole range of logical columns. In other words, for a given row, if all b:d are TRUE, then e would be TRUE; if any b:d are FALSE, then e would be FALSE.

My expected result is:

> tmp
   a     b     c    d     e
1 13  TRUE  TRUE TRUE  TRUE
2 23 FALSE  TRUE TRUE FALSE
3 52  TRUE FALSE TRUE FALSE

I want to indicate the range of columns by indices, as I have a bunch of columns, and the names are cumbersome. The following code works, but i'd rather use a vectorized approach to improve performance.

> tmp$e <- NA
> for(i in 1:nrow(tmp)){
+     tmp[i,"e"] <- all(tmp[i,2:(ncol(tmp)-1)]==TRUE)
+ }
> tmp
   a     b     c    d     e
1 13  TRUE  TRUE TRUE  TRUE
2 23 FALSE  TRUE TRUE FALSE
3 52  TRUE FALSE TRUE FALSE

Any way to do this without using a for loop to step through the rows of the data.frame?

like image 784
mac Avatar asked Jul 09 '12 22:07

mac


People also ask

How do you check if a row contains a value pandas?

You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.

What is subset of rows?

A Row Subset is a selection of the rows within a whole table being viewed within the application, or equivalently a new table composed from some subset of its rows. You can define these and use them in several different ways; the usefulness comes from defining them in one context and using them in another.


1 Answers

You can use rowSums to loop over rows... and some fancy footwork to make it quasi-automated:

# identify the logical columns
boolCols <- sapply(tmp, is.logical)
# sum each row of the logical columns and
# compare to the total number of logical columns
tmp$e <- rowSums(tmp[,boolCols]) == sum(boolCols)
like image 51
Joshua Ulrich Avatar answered Oct 06 '22 04:10

Joshua Ulrich