I'm trying to apply a function to a group of columns in a large data.table without referring to each one individually.
a <- data.table( a=as.character(rnorm(5)), b=as.character(rnorm(5)), c=as.character(rnorm(5)), d=as.character(rnorm(5)) ) b <- c('a','b','c','d')
with the MWE above, this:
a[,b=as.numeric(b),with=F]
works, but this:
a[,b[2:3]:=data.table(as.numeric(b[2:3])),with=F]
doesn't work. What is the correct way to apply the as.numeric
function to just columns 2 and 3 of a
without referring to them individually.
(In the actual data set there are tens of columns so it would be impractical)
To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.
data. table(DT) is TRUE. To better description, I put parts of my original code here. So you may understand where goes wrong.
The idiomatic approach is to use .SD
and .SDcols
You can force the RHS to be evaluated in the parent frame by wrapping in ()
a[, (b) := lapply(.SD, as.numeric), .SDcols = b]
For columns 2:3
a[, 2:3 := lapply(.SD, as.numeric), .SDcols = 2:3]
or
mysubset <- 2:3 a[, (mysubset) := lapply(.SD, as.numeric), .SDcols = mysubset]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With