Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table: Proper way to do create a conditional variable when column names are not known?

My question relates to the creation of a variable which depends upon other columns within a data.table when none of the variable names are known in advance.

Below is a toy example where I have 5 rows and the new variable should be 1 when the condition is equal to A and 4 elsewise.

library(data.table)
DT <- data.table(Con = c("A","A","B","A","B"),
                 Eval_A = rep(1,5),
                 Eval_B = rep(4,5))
Col1 <- "Con"
Col2 <- "Eval_A"
Col3 <- "Eval_B"
Col4 <- "Ans"

The code below works but feels like I'm misusing the package!

DT[,Col4:=ifelse(DT[[Col1]]=="A",
                 DT[[Col2]],
                 DT[[Col3]]),with=FALSE]

Update: Thanks, I did some quick timing of the answers below. Once on a data.table with 5 million rows and only the relevant columns and again after adding 10 non relevant columns, below are the results:

+-------------------------+---------------------+------------------+
|         Method          | Only relevant cols. | With extra cols. |
+-------------------------+---------------------+------------------+
| List method             | 1.8                 | 1.91             |
| Grothendieck - get/if   | 26.79               | 30.04            |
| Grothendieck - get/join | 0.48                | 1.56             |
| Grothendieck - .SDCols  | 0.38                | 0.79             |
| agstudy - Substitute    | 2.03                | 1.9              |
+-------------------------+---------------------+------------------+

Look's like .SDCols is best for speed and using substitute for easy to read code.

like image 533
Dominic Edwards Avatar asked Jul 03 '13 14:07

Dominic Edwards


1 Answers

1. get/if Try using get :

DT[, (Col4) := if (get(Col1) == "A") get(Col2) else get(Col3), by = 1:nrow(DT)]

2. get/join or try this approach:

setkeyv(DT, Col1)
DT[, (Col4):=get(Col3)]["A", (Col4):=get(Col2)]

3. .SDCols or this:

setkeyv(DT, Col1)
DT[, (Col4):=.SD, .SDcols = Col3]["A", (Col4):=.SD, .SDcols = Col2]

UPDATE: Added some additional approaches.

like image 184
G. Grothendieck Avatar answered Oct 25 '22 14:10

G. Grothendieck