I have the following code using ddply
from plyr package:
ddply(mtcars,.(cyl),transform,freq=length(cyl))
The data.table version of this is :
DT<-data.table(mtcars)
DT[,freq:=.N,by=cyl]
How can I extend this when I have more than one function like the one below?
Now, I want to perform more than one function on ddply
and data.table
:
ddply(mtcars,.(cyl),transform,freq=length(cyl),sum=sum(mpg))
DT[,list(freq=.N,sum=sum(mpg)),by=cyl]
But, data.table
gives me only three columns cyl,freq, and sum. Well, I can do like this:
DT[,list(freq=.N,sum=sum(mpg),mpg,disp,hp,drat,wt,qsec,vs,am,gear,carb),by=cyl]
But, I have large number of variables in my read data and I want all of them to be there as in ddply(...transform....)
. Is there shortcut in data.table
just like doing :=
when we have only one function (as above) or something like this paste(names(mtcars),collapse=",")
within data.table
?
Note: I also have a large number of function to run. So, I can't repeat =:
a number of times (but I would prefer this if lapply
can be applied here).
Use backquoted :=
like this...
DT[ , `:=`( freq = .N , sum = sum(mpg) ) , by=cyl ]
head( DT , 3 )
# mpg cyl disp hp drat wt qsec vs am gear carb freq sum
#1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 7 138.2
#2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 7 138.2
#3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 11 293.3
Also useful in some situations:
newvars <- c("freq","sum")
DT[, `:=`(eval(newvars), list(.N,sum(mpg)))]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With