Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order or filtering vs processing

Tags:

r

data.table

When I write

dt[a>0, {...}, by=...]

is {...} processed before or after a>0 filtering? (it appears that the answer is before).

I can imagine both orders being useful, so the right question is, I guess, how can I control the order or filtering vs processing?

like image 230
sds Avatar asked Mar 27 '14 19:03

sds


1 Answers

The i= argument is (quite sensibly) processed first, as you can confirm with something like the following.

library(data.table)

dt <- data.table(a=c(0,1,0,1), grp=c("a", "a", "b", "b"))
#    a grp
# 1: 0   a
# 2: 1   a
# 3: 0   b
# 4: 1   b  

## Show that filtering op in i= is performed before processing in j=
dt[a>0, if(any(a<=0)) stop("a<=0 must've been passed on to j") else a, by=grp]
#    grp V1
# 1:   a  1
# 2:   b  1

## Check that error _is_ thrown when when verboten elements make it past filter 
dt[a<=0, if(any(a<=0)) stop("a<=0 must've been passed on to j") else a, by=grp]
# Error in `[.data.table`(dt, a <= 0, if (any(a <= 0)) \\
# stop("a<=0 must've been passed on to j") else a,  : 
#   a<=0 must've been passed on to j

To perform the filtering operation second, just place it in a second call to [.data.table():

dt[,tot:=sum(a),by=grp][a>0,]
#    a grp tot
# 1: 1   a   1
# 2: 1   b   1
like image 138
Josh O'Brien Avatar answered Sep 28 '22 17:09

Josh O'Brien