My question relates to the creation of a variable which depends upon other columns within a data.table when none of the variable names are known in advance.
Below is a toy example where I have 5 rows and the new variable should be 1 when the condition is equal to A and 4 elsewise.
library(data.table)
DT <- data.table(Con = c("A","A","B","A","B"),
Eval_A = rep(1,5),
Eval_B = rep(4,5))
Col1 <- "Con"
Col2 <- "Eval_A"
Col3 <- "Eval_B"
Col4 <- "Ans"
The code below works but feels like I'm misusing the package!
DT[,Col4:=ifelse(DT[[Col1]]=="A",
DT[[Col2]],
DT[[Col3]]),with=FALSE]
Update: Thanks, I did some quick timing of the answers below. Once on a data.table with 5 million rows and only the relevant columns and again after adding 10 non relevant columns, below are the results:
+-------------------------+---------------------+------------------+
| Method | Only relevant cols. | With extra cols. |
+-------------------------+---------------------+------------------+
| List method | 1.8 | 1.91 |
| Grothendieck - get/if | 26.79 | 30.04 |
| Grothendieck - get/join | 0.48 | 1.56 |
| Grothendieck - .SDCols | 0.38 | 0.79 |
| agstudy - Substitute | 2.03 | 1.9 |
+-------------------------+---------------------+------------------+
Look's like .SDCols is best for speed and using substitute for easy to read code.
1. get/if
Try using get
:
DT[, (Col4) := if (get(Col1) == "A") get(Col2) else get(Col3), by = 1:nrow(DT)]
2. get/join or try this approach:
setkeyv(DT, Col1)
DT[, (Col4):=get(Col3)]["A", (Col4):=get(Col2)]
3. .SDCols or this:
setkeyv(DT, Col1)
DT[, (Col4):=.SD, .SDcols = Col3]["A", (Col4):=.SD, .SDcols = Col2]
UPDATE: Added some additional approaches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With