Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping in data.table: how to get more than 1 column of results?

Tags:

r

data.table

I have a data.table object like this one

library(data.table)

a <- structure(list(PERMNO = c(10006L, 10006L, 10015L, 10015L, 20000L, 20000L), 
                    SHROUT = c(1427L, 1427L, 1000L, 1001L, 200L, 200L), 
                    PRC = c(6.5, 6.125, 0.75, 0.5, 3, 4), 
                    RET = c(0.005, -0.005, -0.001, 0.05, -0.002, 0.0031)),
                   .Names = c("PERMNO", "SHROUT", "PRC", "RET"), 
               class = c("data.table", "data.frame"), row.names = c(NA, -6L))

setkey(a,PERMNO)

and I need to perform a number of calculations by PERMNO, but here in this example let's supposed they are only 2:

mktcap <- a[ , tail(SHROUT,n=1)*tail(PRC,n=1),by=PERMNO]
sqret <- a[, sum(RET^2),by=PERMNO]

which produce

> mktcap
     PERMNO       V1
[1,]  10006 8740.375
[2,]  10015  500.500
[3,]  20000  800.000

> sqret
     PERMNO        V1
[1,]  10006 5.000e-05
[2,]  10015 2.501e-03
[3,]  20000 1.361e-05

I would like to combine the two functions into one, to produce a matrix (or data.table, data.frame, whatever) with 3 columns, the first with the PERMNOs, the second with mktcap and the third with sqrt.

The problem is that this grouping function (i.e. variable[ , function(), by= ]) seems to only produce results with two columns, one with the keys and one with results.

This is my attempt (one of many) to produce what I want:

comb.fun <- function(datai) {
     mktcap <- as.matrix(tail(datai[,1],n=1)*tail(datai[,2],n=1),ncol=1)
     sqret <- as.matrix(sum(datai[,3]^2),ncol=1)
     return(c(mktcap,sqret))
}   

myresults <- a[, comb.fun(cbind(SHROUT,PRC,RET)), by=PERMNO]

which produces

     PERMNO           V1
[1,]  10006 8.740375e+03
[2,]  10006 5.000000e-05
[3,]  10015 5.005000e+02
[4,]  10015 2.501000e-03
[5,]  20000 8.000000e+02
[6,]  20000 1.361000e-05

(the results are all there, but they were forced into one column). No matter what I try, I cannot get grouping to return a matrix with more than two columns (or more than one column of results).

Is it possible to get two or more column of results with grouping in data.table?

like image 847
Vivi Avatar asked Jun 27 '12 18:06

Vivi


2 Answers

The answer (using list() to collect the several desired summary stats) is there in the excellent Examples section of the ?data.table help file. (It's about 20 lines up from the bottom).

out <- a[ , list(mktcap = tail(SHROUT,n=1)*tail(PRC,n=1),
                 sqret  = sum(RET^2)),
         by=PERMNO]

out
#    PERMNO   mktcap     sqret
# 1:  10006 8740.375 5.000e-05
# 2:  10015  500.500 2.501e-03
# 3:  20000  800.000 1.361e-05

Edit:

In the comments below, Matthew Dowle describes a simple way to clean up code in which the j argument in calls like x[i,j,by] is getting awkwardly long.

Implementing his suggestion on the call above, you could instead do:

## 1) Use quote() to make an expression object out of the statement passed to j
mm <- quote(list(mktcap = tail(SHROUT,n=1)*tail(PRC,n=1),
                 sqret  = sum(RET^2)))

## 2) Use eval() to evaluate it as if it had been typed directly in the call
a[ , eval(mm), by=PERMNO]
#    PERMNO   mktcap     sqret
# 1:  10006 8740.375 5.000e-05
# 2:  10015  500.500 2.501e-03
# 3:  20000  800.000 1.361e-05
like image 92
Josh O'Brien Avatar answered Nov 18 '22 16:11

Josh O'Brien


how about

comb.fun <- function(a) {
 mktcap <- a[ , tail(SHROUT,n=1)*tail(PRC,n=1),by=PERMNO]
 sqret <- a[, sum(RET^2),by=PERMNO]

 return(merge(mktcap,sqret))
} 
like image 30
shhhhimhuntingrabbits Avatar answered Nov 18 '22 15:11

shhhhimhuntingrabbits