Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function across subset of columns in data.table with .SDcols

Tags:

r

data.table

I want to apply a function over a subset of variables in a data.table. In this case I'm simply changing variable types. I can do this a few different ways in data.table, however I'm looking for a way that does not require an intermediate assignment (mycols in this example) and does not require me to specify the columns I want to change twice. Here is a simplified reproducible example:

library('data.table')
n<-30
dt <- data.table(a=sample(1:5, n, replace=T),
       b=as.character(sample(seq(as.Date('2011-01-01'), as.Date('2015-01-01'), length.out=n))),
       c1235=as.character(sample(seq(as.Date('2012-01-01'), as.Date('2013-01-01'), length.out=n))),
       d7777=as.character(sample(seq(as.Date('2012-01-01'), as.Date('2013-01-01'), length.out=n)))
)

WAY 1: this works... but it's hard-coded

mycols <- c('b', 'c1235', 'd7777')
dt1 <- dt[,(mycols):=lapply(.SD, as.Date), .SDcols=mycols]

WAY 2: this works... but I need to crate an intermediate object for it to work (mycols)

mycols <- which(sapply(dt, class)=='character')
dt2 <- dt[,(mycols):=lapply(.SD, as.Date), .SDcols=mycols]

WAY 3: this works, but I need to specify this long expression twice

dt3 <- dt[,(which(sapply(dt, class)=='character')):=lapply(.SD, as.Date), .SDcols=which(sapply(dt, class)=='character')]

WAY 4: this doesn't work, but I want something like this that allows me to only specify the variables that make .SDcols once. I'm looking for some way to replace (.SD):= with something that works... or chain things together. Really I'd be curious to see if anyone has a method for performing what is done in WAY 1,2,3 without specifying an intermediate assignment that bloats the environment and does not require specifying the same columns twice.

dt3 <- dt[,(.SD):=lapply(.SD, as.Date), .SDcols=which(sapply(dt, class)=='character')]
like image 610
ajb Avatar asked Jul 09 '15 19:07

ajb


1 Answers

here's a one line answer...

for (j in  which(sapply(dt, class)=='character')) set(dt, i=NULL, j=j, value=as.Date(dt[[j]]))

Here's a question where Arun and Matt each prefer set with a for loop instead of using .SD

How to apply same function to every specified column in a data.table

like image 130
Dean MacGregor Avatar answered Oct 11 '22 07:10

Dean MacGregor