Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying function rowwise efficiently

I have a dataframe with multiple columns containing information on one diagnosis. The entries are TRUE, FALSE or NA. I create a vector which summarizes those columns as follows: If a patient was diagnosed at some time (TRUE), then TRUE, if the only valid entry is FALSE, then FALSE and if there just missings, then NA. Written text as code:

data.frame(a= c(FALSE, TRUE, NA, FALSE, TRUE, NA, FALSE, TRUE, NA),
           b= c(FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, NA, NA, NA),
           expected= c(FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, NA))

I need to go trough all the columns rowwise and I do so using split. Unfortunatelly, my data is big and it takes a long while. What I do at the moment is

library(magrittr)
# big example data
df <- expand.grid(c(FALSE, TRUE, NA), c(FALSE, TRUE, NA)) %>%
  .[rep(1:nrow(.), 50000), ] %>%
  as.data.frame() %>%
  setNames(., nm= c("a", "b"))

# My approach
df$res <- df %>%
  split(., 1:nrow(.)) %>%
  lapply(., function(row_i){
    ifelse(all(is.na(row_i)), NA,
           ifelse(any(row_i, na.rm= TRUE), TRUE,
                  ifelse(any(!row_i, na.rm= TRUE), FALSE,
                         row_i)))
  }) %>%
  unlist()

Is there a more efficient way to solve this task?

like image 967
Igor stands with Ukraine Avatar asked Feb 02 '26 21:02

Igor stands with Ukraine


1 Answers

A vectorized solution using pmax():

df$result <- as.logical(do.call(\(...) pmax(..., na.rm = TRUE), df[1:2]))

df
#       a     b expected result
# 1 FALSE FALSE    FALSE  FALSE
# 2  TRUE FALSE     TRUE   TRUE
# 3    NA FALSE    FALSE  FALSE
# 4 FALSE  TRUE     TRUE   TRUE
# 5  TRUE  TRUE     TRUE   TRUE
# 6    NA  TRUE     TRUE   TRUE
# 7 FALSE    NA    FALSE  FALSE
# 8  TRUE    NA     TRUE   TRUE
# 9    NA    NA       NA     NA

You can also merge all the parameters into a list to avoid the anonymous function in do.call(). I rewrite it as a function rowAnys to complement rowSums/rowMeans in base.

rowAnys <- function(x) {
  as.logical(do.call(pmax, c(na.rm = TRUE, x)))
}

You could also use pmin to implement rowwise-all().

rowAlls <- function(x) {
  as.logical(do.call(pmin, c(na.rm = TRUE, x)))
}
df$any <- rowAnys(df[1:2])
df$all <- rowAlls(df[1:2])

df
#       a     b expected   any   all
# 1 FALSE FALSE    FALSE FALSE FALSE
# 2  TRUE FALSE     TRUE  TRUE FALSE
# 3    NA FALSE    FALSE FALSE FALSE
# 4 FALSE  TRUE     TRUE  TRUE FALSE
# 5  TRUE  TRUE     TRUE  TRUE  TRUE
# 6    NA  TRUE     TRUE  TRUE  TRUE
# 7 FALSE    NA    FALSE FALSE FALSE
# 8  TRUE    NA     TRUE  TRUE  TRUE
# 9    NA    NA       NA    NA    NA
like image 146
Darren Tsai Avatar answered Feb 04 '26 13:02

Darren Tsai



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!