I am trying to use data.table to recode a variable based on certain conditions. My original dataset has around 30M records and after all variable creation around 130 variables. I used the methods suggested here: conditional statements in data.table (M1) and also here data.table: Proper way to do create a conditional variable when column names are not known? (M2)
My goal is get the equivalent of the below code but something that is applicable using data.table
samp$lf5 <- samp$loadfactor5
samp$lf5 <- with(samp, ifelse(loadfactor5 < 0, 0, lf5))
I will admit that I don't understand .SD and .SDCols very well, so I might be using it wrong. The code and errors from (M1) and (M2) are given below and the sample dataset is here: http://goo.gl/Jp97Wn
(M1)
samp[,lf5 = if(loadfactor5 <0) 0 else loadfactor5]
Error Message
Error in `[.data.table`(samp, , lf5 = if (loadfactor5 < 0) 0 else loadfactor5) :
unused argument (lf5 = if (loadfactor5 < 0) 0 else loadfactor5)
When I do this:
samp[,list(lf5 = if(loadfactor5 <0) 0 else loadfactor5)]
it gives lf5 as a list but not as part of the samp data.table and does not really apply the condition as lf5 still has values less than 0.
(M2)
Col1 <- "loadfactor5"
Col2 <- "lf5"
setkeyv(samp,Col1)
samp[,(Col2) :=.SD,.SDCols = Col1][Col1<0,(Col2) := .SD, .SDcols = 0]
I get the following error
Error in `[.data.table`(samp, , `:=`((Col2), .SD), .SDCols = Col1) :
unused argument (.SDCols = Col1)
Any insights on how to finish this appreciated. My dataset has 30M records so I am hoping to use data.table to really cut the run time down.
Thanks,
Krishnan
The if statement takes a condition; if the condition evaluates to TRUE , the R code associated with the if statement is executed. if (condition) { expr. } The condition to check appears inside parentheses, while the R code that has to be executed if the condition is TRUE , follows in curly brackets ( expr ).
In R language there are two forms of the if-else conditional statement; the 'if' statement which works on single element vector and the 'ifelse' statement that works on vectors of greater than one element.
To run an if-then statement in R, we use the if() {} function. The function has two main elements, a logical test in the parentheses, and conditional code in curly braces. The code in the curly braces is conditional because it is only evaluated if the logical test contained in the parentheses is TRUE .
Answer provided by eddi and included here for the sake of completeness.
samp[, lf5 := ifelse(loadfactor5 < 0, 0, loadfactor5)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With