Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid vector search in data.table

Tags:

r

data.table

I have a data.table X that I would like to create a variable based on 2 character variables

   X[, varC :=((VarA =="A" & !is.na(VarA)) 
               | (VarA == "AB" & VarB =="B" & !is.na(VarA) & !is.na(VarB))
                )
      ]

This code works but it is very slow, because it does vector scan on 2 char variables. Note that I don't setkey claims4 table by VarA and VarB. Is there a "right" way to do this in data.table?

Update 1: I don't use setkey for this transformation because I already use setkey(X, Year, ID) for other variable transformations. If I do, I need to reset keys back to Year, ID after this transformation.

Update 2: I did benchmark my approach with Matthew's approach, and his is much faster:

          test replications elapsed relative user.self sys.self user.child sys.child
2 Matthew               100   3.377    1.000     2.596    0.605          0         0
1 vectorSearch          100 200.437   59.354    76.628   40.260          0         0

The only minor thing is setkey then re-setkey again is somewhat verbose :)

like image 452
AdamNYC Avatar asked Dec 01 '12 06:12

AdamNYC


1 Answers

How about :

setkey(X,VarA,VarB)
X[,varC:=FALSE]
X["A",varC:=TRUE]
X[J("A","AB"),varC:=TRUE]

or, in one line (to save repetitions of the variable X and to demonstrate) :

X[,varC:=FALSE]["A",varC:=TRUE][J("A","AB"),varC:=TRUE]

To avoid setting the key, as requested, how about a manual secondary key :

S = setkey(X[,list(VarA,VarB,i=seq_len(.N))],VarA,VarB)
X[,varC:=FALSE]
X[S["A",i][[2]],varC:=TRUE]
X[S[J("A","AB"),i][[3]],varC:=TRUE]

Now clearly, that syntax is ugly. So FR#1007 Build in secondary keys is to build that into the syntax; e.g.,

set2key(X,varA,varB)
X[...some way to specify which key to join to..., varC:=TRUE]

In the meantime it's possible, just manually, as shown above.

like image 158
Matt Dowle Avatar answered Oct 20 '22 05:10

Matt Dowle