Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply a function to a subset of data.table columns, by column-indices instead of name

I'm trying to apply a function to a group of columns in a large data.table without referring to each one individually.

a <- data.table(   a=as.character(rnorm(5)),   b=as.character(rnorm(5)),   c=as.character(rnorm(5)),   d=as.character(rnorm(5)) ) b <- c('a','b','c','d') 

with the MWE above, this:

a[,b=as.numeric(b),with=F] 

works, but this:

a[,b[2:3]:=data.table(as.numeric(b[2:3])),with=F] 

doesn't work. What is the correct way to apply the as.numeric function to just columns 2 and 3 of a without referring to them individually.

(In the actual data set there are tens of columns so it would be impractical)

like image 426
Tahnoon Pasha Avatar asked May 28 '13 03:05

Tahnoon Pasha


People also ask

How do I subset a column by index in R?

To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.

Is data table DT == true?

data. table(DT) is TRUE. To better description, I put parts of my original code here. So you may understand where goes wrong.


1 Answers

The idiomatic approach is to use .SD and .SDcols

You can force the RHS to be evaluated in the parent frame by wrapping in ()

a[, (b) := lapply(.SD, as.numeric), .SDcols = b] 

For columns 2:3

a[, 2:3 := lapply(.SD, as.numeric), .SDcols = 2:3] 

or

mysubset <- 2:3 a[, (mysubset) := lapply(.SD, as.numeric), .SDcols = mysubset] 
like image 59
mnel Avatar answered Sep 28 '22 05:09

mnel