I'm trying to subset a given data.table
DT <- data.table(
a = c(1:20),
b = (3:4),
c = (5:14),
d = c(1:4)
)
within a function by a parameter which is a named list
param <- list(a = 1:10,
b = 2:3,
c = c(5, 7, 10))
I am maybe a bit stuck here but I certainly do not want implement something ugly like this. Especially since its not very dynamic.
DT[(if (!is.null(param$a))
a %in% param$a
else
TRUE)
&
(if (!is.null(param$b))
b %in% param$b
else
TRUE)
&
(if (!is.null(param$c))
c %in% param$c
else
TRUE)
&
(if (!is.null(param$d))
d %in% param$d
else
TRUE)]
a b c d
1: 1 3 5 1
2: 3 3 7 3
Any ideas how to achieve this in an elegant way in data.table or base R using the names of the named list to subset the corresponding columns in the data.table with the associate values? Thanks!
EDIT
I performed a microbenchmark with some of the answers:
func_4 <- function(myp, DT) {
myp = Filter(Negate(is.null), param)
exs = Map(function(var, val)
call("%in%", var, val),
var = sapply(names(myp), as.name),
val = myp)
exi = Reduce(function(x, y)
call("&", x, y), exs)
ex = call("[", x = as.name("DT"), i = exi)
# eval(as.call(c(as.list(ex))))
eval(ex)
}
microbenchmark(
(DT[do.call(pmin, Map(`%in%`, DT[, names(param), with = FALSE], param)) == 1L]),
(DT[rowSums(mapply(`%in%`, DT[, names(param), with = FALSE], param)) == length(param)]),
(DT[do.call(CJ, param), on = names(param), nomatch = NULL]),
(DT[expand.grid(param), on = names(param), nomatch = NULL]),
(DT[DT[, all(mapply(`%in%`, .SD, param)), by = 1:nrow(DT), .SDcols = names(param)]$V1]),
(func_4(myp = param, DT = DT)),
times = 200)
min lq mean median uq max neval
446.656 488.5365 565.5597 511.403 533.7785 7167.847 200
454.120 516.3000 566.8617 538.146 561.8965 1840.982 200
2433.450 2538.6075 2732.4749 2606.986 2704.5285 10302.085 200
2478.595 2588.7240 2939.8625 2642.311 2743.9375 10722.578 200
2648.707 2761.2475 3040.4926 2814.177 2903.8845 10334.822 200
3243.040 3384.6220 3764.5087 3484.423 3596.9140 14873.898 200
You can use the CJ
(Cross Join) function from data.table
to make a filtering table from the list.
lookup <- do.call(CJ, param)
head(lookup)
# a b c
# 1: 1 2 5
# 2: 1 2 7
# 3: 1 2 10
# 4: 1 3 5
# 5: 1 3 7
# 6: 1 3 10
DT[
lookup,
on = names(lookup),
nomatch = NULL
]
# a b c d
# 1: 1 3 5 1
# 2: 3 3 7 3
Note that nomatch = 0
means any combo in lookup
that doesn't exist in DT
won't return a row.
Using Map
we can do
DT[DT[, all(Map(`%in%`, .SD, param)), by = 1:nrow(DT)]$V1]
# a b c d
#1: 1 3 5 1
#2: 3 3 7 3
For each row we check if all elements in DT
are present in param
.
Thanks to @Frank, this can be improved to
DT[DT[, all(mapply(`%in%`, .SD, param)), by = 1:nrow(DT), .SDcols=names(param)]$V1]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With