Consider the data:
library(data.table)
library(magrittr)
vec1 <- c("Iron", "Copper")
vec2 <- c("Defective", "Passed", "Error")
set.seed(123)
a1 <- sample(x = vec1, size = 20, replace = T)
b1 <- sample(x = vec2, size = 20, replace = T)
set.seed(1234)
a2 <- sample(x = vec1, size = 20, replace = T)
b2 <- sample(x = vec2, size = 20, replace = T)
DT <- data.table(
c(1:20), a1, b1, a2, b2
) %>% .[order(V1)]
names(DT) <- c("id", "prod_name_1", "test_1", "prod_name_2", "test_2")
I need to filter the rows whose value for test_1
OR test_2
is "Passed"
. Thus if neither of these columns have the specified value, then delete the row. With dplyr
, we can use the filter_at()
verb:
> # dplyr solution...
>
> cols <- grep(x = names(DT), pattern = "test", value = T, ignore.case = T)
>
>
> DT %>%
+ dplyr::filter_at(.vars = grep(x = names(DT), pattern = "test", value = T, ignore.case = T),
+ dplyr::any_vars(. == "Passed")) -> DT.2
>
> DT.2
id prod_name_1 test_1 prod_name_2 test_2
1 3 Iron Passed Copper Defective
2 5 Copper Passed Copper Defective
3 7 Copper Passed Iron Passed
4 8 Copper Passed Iron Error
5 11 Copper Error Copper Passed
6 14 Copper Error Copper Passed
7 16 Copper Passed Copper Error
Cool. Is there any similar way to perform this operation in data.table
?
This is the closest I've got:
> lapply(seq_along(cols), function(x){
+
+ setkeyv(DT, cols[[x]])
+
+ DT["Passed"]
+
+ }) %>%
+ do.call(rbind,.) %>%
+ unique -> DT.3
>
> DT.3
id prod_name_1 test_1 prod_name_2 test_2
1: 3 Iron Passed Copper Defective
2: 5 Copper Passed Copper Defective
3: 8 Copper Passed Iron Error
4: 16 Copper Passed Copper Error
5: 7 Copper Passed Iron Passed
6: 11 Copper Error Copper Passed
7: 14 Copper Error Copper Passed
>
> identical(data.table(DT.2)[order(id)], DT.3[order(id)])
[1] TRUE
Does any of you have a more elegant solution? Preferably something contained in a verb like dplyr::filter_at()
.
We can specify the 'cols' in .SDcols
, loop through the Subset of Data.table (.SD
) to compare whether the value is "Passed", Reduce
it to a single vector
with |
and subset the rows
res2 <- DT[DT[, Reduce(`|`, lapply(.SD, `==`, "Passed")), .SDcols = cols]]
Comparing with the dplyr
output in the OP's post
identical(as.data.table(res1), res2)
#[1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With