Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent to ddply(...,transform,...) in data.table

I have the following code using ddply from plyr package:

ddply(mtcars,.(cyl),transform,freq=length(cyl))

The data.table version of this is :

DT<-data.table(mtcars)

DT[,freq:=.N,by=cyl]

How can I extend this when I have more than one function like the one below?

Now, I want to perform more than one function on ddply and data.table:

ddply(mtcars,.(cyl),transform,freq=length(cyl),sum=sum(mpg))

DT[,list(freq=.N,sum=sum(mpg)),by=cyl] 

But, data.table gives me only three columns cyl,freq, and sum. Well, I can do like this:

DT[,list(freq=.N,sum=sum(mpg),mpg,disp,hp,drat,wt,qsec,vs,am,gear,carb),by=cyl]

But, I have large number of variables in my read data and I want all of them to be there as in ddply(...transform....). Is there shortcut in data.table just like doing := when we have only one function (as above) or something like this paste(names(mtcars),collapse=",") within data.table? Note: I also have a large number of function to run. So, I can't repeat =: a number of times (but I would prefer this if lapply can be applied here).

like image 347
Metrics Avatar asked Oct 24 '13 14:10

Metrics


2 Answers

Use backquoted := like this...

DT[ , `:=`( freq = .N , sum = sum(mpg) ) , by=cyl ]
head( DT , 3 )
#    mpg cyl disp  hp drat    wt  qsec vs am gear carb freq   sum
#1: 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4    7 138.2
#2: 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4    7 138.2
#3: 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1   11 293.3
like image 165
Simon O'Hanlon Avatar answered Nov 14 '22 23:11

Simon O'Hanlon


Also useful in some situations:

newvars <- c("freq","sum")
DT[, `:=`(eval(newvars), list(.N,sum(mpg)))]
like image 37
Michael Avatar answered Nov 14 '22 23:11

Michael