When I write
dt[a>0, {...}, by=...]
is {...}
processed before or after a>0
filtering? (it appears that the answer is before).
I can imagine both orders being useful, so the right question is, I guess, how can I control the order or filtering vs processing?
The i=
argument is (quite sensibly) processed first, as you can confirm with something like the following.
library(data.table)
dt <- data.table(a=c(0,1,0,1), grp=c("a", "a", "b", "b"))
# a grp
# 1: 0 a
# 2: 1 a
# 3: 0 b
# 4: 1 b
## Show that filtering op in i= is performed before processing in j=
dt[a>0, if(any(a<=0)) stop("a<=0 must've been passed on to j") else a, by=grp]
# grp V1
# 1: a 1
# 2: b 1
## Check that error _is_ thrown when when verboten elements make it past filter
dt[a<=0, if(any(a<=0)) stop("a<=0 must've been passed on to j") else a, by=grp]
# Error in `[.data.table`(dt, a <= 0, if (any(a <= 0)) \\
# stop("a<=0 must've been passed on to j") else a, :
# a<=0 must've been passed on to j
To perform the filtering operation second, just place it in a second call to [.data.table()
:
dt[,tot:=sum(a),by=grp][a>0,]
# a grp tot
# 1: 1 a 1
# 2: 1 b 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With