I have a data.frame with a block of columns that are logicals, e.g.
> tmp <- data.frame(a=c(13, 23, 52),
+ b=c(TRUE,FALSE,TRUE),
+ c=c(TRUE,TRUE,FALSE),
+ d=c(TRUE,TRUE,TRUE))
> tmp
a b c d
1 13 TRUE TRUE TRUE
2 23 FALSE TRUE TRUE
3 52 TRUE FALSE TRUE
I'd like to compute a summary column (say: e) that is a logical AND
over the whole range of logical columns. In other words, for a given row, if all b:d are TRUE
, then e would be TRUE
; if any b:d are FALSE
, then e would be FALSE
.
My expected result is:
> tmp
a b c d e
1 13 TRUE TRUE TRUE TRUE
2 23 FALSE TRUE TRUE FALSE
3 52 TRUE FALSE TRUE FALSE
I want to indicate the range of columns by indices, as I have a bunch of columns, and the names are cumbersome. The following code works, but i'd rather use a vectorized approach to improve performance.
> tmp$e <- NA
> for(i in 1:nrow(tmp)){
+ tmp[i,"e"] <- all(tmp[i,2:(ncol(tmp)-1)]==TRUE)
+ }
> tmp
a b c d e
1 13 TRUE TRUE TRUE TRUE
2 23 FALSE TRUE TRUE FALSE
3 52 TRUE FALSE TRUE FALSE
Any way to do this without using a for
loop to step through the rows of the data.frame?
You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas. series. isin() , str.
A Row Subset is a selection of the rows within a whole table being viewed within the application, or equivalently a new table composed from some subset of its rows. You can define these and use them in several different ways; the usefulness comes from defining them in one context and using them in another.
You can use rowSums
to loop over rows... and some fancy footwork to make it quasi-automated:
# identify the logical columns
boolCols <- sapply(tmp, is.logical)
# sum each row of the logical columns and
# compare to the total number of logical columns
tmp$e <- rowSums(tmp[,boolCols]) == sum(boolCols)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With