Consider
target <- "vs"
value <- 1
library(data.table)
dt <- as.data.table(head(mtcars))
So I'm trying to pass both column name and a value as variables into the j
expression within data.table
environment, something that would be equivalent to
dt[, vs == 1]
# [1] FALSE FALSE TRUE TRUE FALSE TRUE
If only the value is the variable, it works nicely
dt[, vs == value]
# [1] FALSE FALSE TRUE TRUE FALSE TRUE
We can also call the column within the data.table scope when it's a variable
dt[, target, with = FALSE]
# vs
# 1: 0
# 2: 0
# 3: 1
# 4: 1
# 5: 0
# 6: 1
But I can't figure how to combine the two in a simple manner
Note: I'm well aware that I can simply do:
dt[[target]] == value
# [1] FALSE FALSE TRUE TRUE FALSE TRUE
But I need it within the data table scope so I could modify other columns by reference, something like
dt[, NEWCOL := sum(vs == 1), by = am]
So here are my tries when both column name and the value are variables
dt[, target == value, with = FALSE]
# Null data.table (0 rows and 0 cols)
dt[, target == value]
# [1] FALSE
dt[, (target) == value]
# [1] FALSE
dt[, .(target == value)]
# V1
# 1: FALSE
dt[, eval(target) == value]
# [1] FALSE
dt[target %in% value]
## Empty data.table (0 rows) of 11 cols: mpg,cyl,disp,hp,drat,wt...
Eventually I came up with
dt[, .SD[[target]] == value]
# [1] FALSE FALSE TRUE TRUE FALSE TRUE
but it is very inefficient, here's a simple benchmark
set.seed(123)
n <- 1e6
dt <- data.table(vs = sample(1L:30L, n, replace = TRUE), am = seq_len(n))
system.time(dt[, NEWCOL := sum(.SD[[target]] == value), by = am])
# user system elapsed
# 13.00 0.02 13.12
system.time(dt[, NEWCOL2 := sum(vs == value), by = am])
# user system elapsed
# 0.82 0.00 0.83
Question: Is there any better way of doing this that I'm missing here? Something either more idiomatic or much more efficient
Edit
Originally I was looking for something idiomatic, so I thought @GGrothendieck simple solution using get
was the one, but surprisingly all @Richard version are beating even the version that ins't doing any evaluation of the column name
set.seed(123)
n <- 1e7
dt <- data.table(vs = sample(1L:30L, n, replace = TRUE), am = seq_len(n))
cl <- substitute(
x == y,
list(x = as.name(target), y = value)
)
cl2 <- call("==", as.name(target), value)
system.time(dt[, NEWCOL := sum(vs == value), by = am])
# user system elapsed
# 0.83 0.00 0.82
system.time(dt[, NEWCOL1 := sum(.SD[[target]] == value), by = am])
# user system elapsed
# 8.97 0.00 8.97
system.time(dt[, NEWCOL2 := sum(get(target) == value), by = am])
# user system elapsed
# 2.35 0.00 2.37
system.time(dt[, NEWCOL3 := sum(eval(cl)), by = am])
# user system elapsed
# 0.69 0.02 0.71
system.time(dt[, NEWCOL4 := sum(eval(cl2)), by = am])
# user system elapsed
# 0.76 0.00 0.77
system.time(dt[, NEWCOL5 := sum(eval(as.name(target)) == value), by = am])
# user system elapsed
# 0.78 0.00 0.78
Here is one possible alternative.
target <- "vs"
value <- 1
dt <- as.data.table(head(mtcars))
In terms of code it's not necessarily simpler, but we can set up an unevaluated call cl
defined outside the scope of dt
which is to be evaluated inside the data table's environment.
cl <- substitute(
x == y,
list(x = as.name(target), y = value)
)
substitute()
might be necessary for longer expressions. But in this case, call()
would shorten the code and create the same cl
result. And so cl
could also be
cl <- call("==", as.name(target), value)
Now we can evaluate cl
inside dt
. On your example this seems to work fine.
dt[, NEWCOL := sum(eval(cl)), by = am][]
# mpg cyl disp hp drat wt qsec vs am gear carb NEWCOL
# 1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1
# 2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 1
# 3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
# 4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 2
# 5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 2
# 6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 2
After thinking about this for a minute, I'm not sure value
needed to be substituted, and hence the following also works. But as David notes, the first approach is more time efficient.
dt[, eval(as.name(target)) == value]
# [1] FALSE FALSE TRUE TRUE FALSE TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With