I have a data.table
object like this one
library(data.table)
a <- structure(list(PERMNO = c(10006L, 10006L, 10015L, 10015L, 20000L, 20000L),
SHROUT = c(1427L, 1427L, 1000L, 1001L, 200L, 200L),
PRC = c(6.5, 6.125, 0.75, 0.5, 3, 4),
RET = c(0.005, -0.005, -0.001, 0.05, -0.002, 0.0031)),
.Names = c("PERMNO", "SHROUT", "PRC", "RET"),
class = c("data.table", "data.frame"), row.names = c(NA, -6L))
setkey(a,PERMNO)
and I need to perform a number of calculations by PERMNO
, but here in this example let's supposed they are only 2:
mktcap <- a[ , tail(SHROUT,n=1)*tail(PRC,n=1),by=PERMNO]
sqret <- a[, sum(RET^2),by=PERMNO]
which produce
> mktcap
PERMNO V1
[1,] 10006 8740.375
[2,] 10015 500.500
[3,] 20000 800.000
> sqret
PERMNO V1
[1,] 10006 5.000e-05
[2,] 10015 2.501e-03
[3,] 20000 1.361e-05
I would like to combine the two functions into one, to produce a matrix (or data.table, data.frame, whatever) with 3 columns, the first with the PERMNO
s, the second with mktcap
and the third with sqrt
.
The problem is that this grouping function (i.e. variable[ , function(), by= ]
) seems to only produce results with two columns, one with the keys and one with results.
This is my attempt (one of many) to produce what I want:
comb.fun <- function(datai) {
mktcap <- as.matrix(tail(datai[,1],n=1)*tail(datai[,2],n=1),ncol=1)
sqret <- as.matrix(sum(datai[,3]^2),ncol=1)
return(c(mktcap,sqret))
}
myresults <- a[, comb.fun(cbind(SHROUT,PRC,RET)), by=PERMNO]
which produces
PERMNO V1
[1,] 10006 8.740375e+03
[2,] 10006 5.000000e-05
[3,] 10015 5.005000e+02
[4,] 10015 2.501000e-03
[5,] 20000 8.000000e+02
[6,] 20000 1.361000e-05
(the results are all there, but they were forced into one column). No matter what I try, I cannot get grouping to return a matrix with more than two columns (or more than one column of results).
Is it possible to get two or more column of results with grouping in data.table
?
The answer (using list()
to collect the several desired summary stats) is there in the excellent Examples section of the ?data.table
help file. (It's about 20 lines up from the bottom).
out <- a[ , list(mktcap = tail(SHROUT,n=1)*tail(PRC,n=1),
sqret = sum(RET^2)),
by=PERMNO]
out
# PERMNO mktcap sqret
# 1: 10006 8740.375 5.000e-05
# 2: 10015 500.500 2.501e-03
# 3: 20000 800.000 1.361e-05
Edit:
In the comments below, Matthew Dowle describes a simple way to clean up code in which the j
argument in calls like x[i,j,by]
is getting awkwardly long.
Implementing his suggestion on the call above, you could instead do:
## 1) Use quote() to make an expression object out of the statement passed to j
mm <- quote(list(mktcap = tail(SHROUT,n=1)*tail(PRC,n=1),
sqret = sum(RET^2)))
## 2) Use eval() to evaluate it as if it had been typed directly in the call
a[ , eval(mm), by=PERMNO]
# PERMNO mktcap sqret
# 1: 10006 8740.375 5.000e-05
# 2: 10015 500.500 2.501e-03
# 3: 20000 800.000 1.361e-05
how about
comb.fun <- function(a) {
mktcap <- a[ , tail(SHROUT,n=1)*tail(PRC,n=1),by=PERMNO]
sqret <- a[, sum(RET^2),by=PERMNO]
return(merge(mktcap,sqret))
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With