How do I test for a condition across a range of columns in R

Question

I am trying to test for a condition across a range of columns. The data looks something like this

      Name DPD_1 DPD_2 DPD_3 Default_flag
 1:    A    46    63   138         TRUE
 2:    B    12    82    33        FALSE
 3:    C    95    71    55         TRUE
 4:    D    57   133   116         TRUE
 5:    E    48    27   137         TRUE

in the code I need to test if any of DPD_1, DPD_2 or DPD_3 are greater than 90, in which case the Default_flag gets set to TRUE.

The code I am using for this is given below

df1 <- data.table(Name = LETTERS[1:10],DPD_1 = sample(1:100,10),DPD_2 = sample(1:200,10),DPD_3 = sample(1:200,10) )
df1[,Default_flag := ifelse((DPD_1>=90 | DPD_2>=90 | DPD_3>=90 ),TRUE,FALSE)]

Now the problem is with some datasets I need to increase the DPD checks from DPD_1 to say DPD_24 (checking for 24 columns, instead of just 3 in the current example). Is there anyway i can avoid specifying each DPDnumber in the ifelse statement. I am happy to lose the ifelse statement and if some version of apply can work, i would be happy to use that too.

akrun · Accepted Answer

We can use Reduce with | after specifying the columns of interest in .SDcols

df1[, Default_flag :=  Reduce(`|`, lapply(.SD, `>=`, 90)), .SDcols = DPD_1:DPD_3]

Update

Based on the OP's comment, if we need to create a function to automatically detect the column names, then use grep to get the column names based on a pattern. The function below takes the datasets, a pattern ('pat'), a value to compare ('val') and 'n' i.e. the number of columns of a particular pattern

f1 <- function(dat, pat, val, n){
  tmp <- as.data.table(dat)
  nm1 <- head(grep(pat, names(tmp), value = TRUE), n)
  tmp[, Default_flag := Reduce(`|`,lapply(.SD, `>=`, val)), .SDcols = nm1][]
}

f1(df1, "DPD", 90, 2)
f1(df1, "DPD", 90, 3)

As per @aelwan's request an option using tidyverse would be

library(tidyverse)
f2 <- function(dat, pat, val, n){
  pat <- quo_name(enquo(pat))
  nm1 <- head(grep(pat, names(dat), value = TRUE), n)

  dat %>%
      mutate_at(vars(nm1), funs(.>= val)) %>%
      select_at(nm1) %>%
      reduce(`|`) %>%
      mutate(dat, Default_flag = .) 

}

f2(df1, DPD, 90, 2)
f2(df1, DPD, 90, 3)

identical(f1(df1, "DPD", 90, 2), as.data.table(f2(df1, DPD, 90, 2)))
#[1] TRUE
identical(f1(df1, "DPD", 90, 3), as.data.table(f2(df1, DPD, 90, 3)))
#[1] TRUE

How do I test for a condition across a range of columns in R

Tags:

r

if-statement

ashleych

1 Answers

Update

akrun

Recent Activity

Donate For Us

How do I test for a condition across a range of columns in R

Tags:

r

if-statement

ashleych

1 Answers

Update

akrun

Related questions

Recent Activity

Donate For Us