I have a data.table X that I would like to create a variable based on 2 character variables
X[, varC :=((VarA =="A" & !is.na(VarA))
| (VarA == "AB" & VarB =="B" & !is.na(VarA) & !is.na(VarB))
)
]
This code works but it is very slow, because it does vector scan on 2 char variables. Note that I don't setkey claims4 table by VarA and VarB. Is there a "right" way to do this in data.table?
Update 1: I don't use setkey for this transformation because I already use setkey(X, Year, ID) for other variable transformations. If I do, I need to reset keys back to Year, ID after this transformation.
Update 2: I did benchmark my approach with Matthew's approach, and his is much faster:
test replications elapsed relative user.self sys.self user.child sys.child
2 Matthew 100 3.377 1.000 2.596 0.605 0 0
1 vectorSearch 100 200.437 59.354 76.628 40.260 0 0
The only minor thing is setkey then re-setkey again is somewhat verbose :)
How about :
setkey(X,VarA,VarB)
X[,varC:=FALSE]
X["A",varC:=TRUE]
X[J("A","AB"),varC:=TRUE]
or, in one line (to save repetitions of the variable X
and to demonstrate) :
X[,varC:=FALSE]["A",varC:=TRUE][J("A","AB"),varC:=TRUE]
To avoid setting the key, as requested, how about a manual secondary key :
S = setkey(X[,list(VarA,VarB,i=seq_len(.N))],VarA,VarB)
X[,varC:=FALSE]
X[S["A",i][[2]],varC:=TRUE]
X[S[J("A","AB"),i][[3]],varC:=TRUE]
Now clearly, that syntax is ugly. So FR#1007 Build in secondary keys is to build that into the syntax; e.g.,
set2key(X,varA,varB)
X[...some way to specify which key to join to..., varC:=TRUE]
In the meantime it's possible, just manually, as shown above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With