I am trying to test for a condition across a range of columns. The data looks something like this
Name DPD_1 DPD_2 DPD_3 Default_flag
1: A 46 63 138 TRUE
2: B 12 82 33 FALSE
3: C 95 71 55 TRUE
4: D 57 133 116 TRUE
5: E 48 27 137 TRUE
in the code I need to test if any of DPD_1, DPD_2 or DPD_3 are greater than 90, in which case the Default_flag gets set to TRUE.
The code I am using for this is given below
df1 <- data.table(Name = LETTERS[1:10],DPD_1 = sample(1:100,10),DPD_2 = sample(1:200,10),DPD_3 = sample(1:200,10) )
df1[,Default_flag := ifelse((DPD_1>=90 | DPD_2>=90 | DPD_3>=90 ),TRUE,FALSE)]
Now the problem is with some datasets I need to increase the DPD checks from DPD_1 to say DPD_24 (checking for 24 columns, instead of just 3 in the current example). Is there anyway i can avoid specifying each DPDnumber in the ifelse statement. I am happy to lose the ifelse statement and if some version of apply can work, i would be happy to use that too.
We can use Reduce
with |
after specifying the columns of interest in .SDcols
df1[, Default_flag := Reduce(`|`, lapply(.SD, `>=`, 90)), .SDcols = DPD_1:DPD_3]
Based on the OP's comment, if we need to create a function to automatically detect the column names, then use grep
to get the column names based on a pattern. The function below takes the datasets, a pattern ('pat'), a value to compare ('val') and 'n' i.e. the number of columns of a particular pattern
f1 <- function(dat, pat, val, n){
tmp <- as.data.table(dat)
nm1 <- head(grep(pat, names(tmp), value = TRUE), n)
tmp[, Default_flag := Reduce(`|`,lapply(.SD, `>=`, val)), .SDcols = nm1][]
}
f1(df1, "DPD", 90, 2)
f1(df1, "DPD", 90, 3)
As per @aelwan's request an option using tidyverse
would be
library(tidyverse)
f2 <- function(dat, pat, val, n){
pat <- quo_name(enquo(pat))
nm1 <- head(grep(pat, names(dat), value = TRUE), n)
dat %>%
mutate_at(vars(nm1), funs(.>= val)) %>%
select_at(nm1) %>%
reduce(`|`) %>%
mutate(dat, Default_flag = .)
}
f2(df1, DPD, 90, 2)
f2(df1, DPD, 90, 3)
identical(f1(df1, "DPD", 90, 2), as.data.table(f2(df1, DPD, 90, 2)))
#[1] TRUE
identical(f1(df1, "DPD", 90, 3), as.data.table(f2(df1, DPD, 90, 3)))
#[1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With