Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter an R data.table by index and condition

Tags:

r

data.table

Please have a look at the following sample code.

DT <-data.table(1:15,0,rbinom(15,2,0.5))

I can filter by condition DT[V3 == 1,] or select rows by index DT[1:5,].

How can I do both? In the following code, the sequence of the indexed rows seems to by ignored:

DT[V3 == 1 & 1:5]

I could do DT[1:5,][V3 == 1], but then, for example, I wouldn't be able to modify the filtered rows:

DT[1:5,][V3 == 1, V2 := 1]

This only works with the following workaround:

DT[V3 == 1 & DT[,.I <= 5], V2 := 1]

However, this looks too data.frame-ish to me. Is there a more elegant way and why does DT[V3 == 1 & 1:5] not work?

like image 454
kato-m Avatar asked Mar 07 '16 09:03

kato-m


1 Answers

Here's a faster way for @akrun's example:

set.seed(24)
DT <- data.table(1:1e6, 0, rbinom(1e6, 2, 0.5))
DT1 <- copy(DT)
DT2 <- copy(DT)

library(microbenchmark)
microbenchmark( 
    DT1[which(V3[1:5]==1L), V2:= 1], 
    DT2[intersect(which(V3==1), 1:5), V2 := 1]
, times = 1, unit = "relative" )

# Unit: relative
#        expr      min       lq     mean   median       uq      max neval
#  sequential  1.00000  1.00000  1.00000  1.00000  1.00000  1.00000     1
#     set_ops 55.43582 55.43582 55.43582 55.43582 55.43582 55.43582     1

It's "sequential" in the sense that we subset by index before evaluating the condition.

The generalization is

cond = quote(V3 == 1)
indx = 1:5

DT[ DT[indx, indx[eval(cond)]], V2 := 1]
# or
set(DT, i = DT[indx, indx[eval(cond)]], j = "V2", v = 1)
like image 158
Frank Avatar answered Oct 05 '22 09:10

Frank