Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use aggregate with a function that uses data from two columns (e.g. cov or prod)

Tags:

r

aggregate

zoo

I have a long time series of daily data and 101 columns. Each month I would like to calculate the cov of each of the first 100 columns with the 101st column. This would generate a monthly covariance with the 101st column for each of the 100 columns based on daily data. It seems that aggregate does what I want with functions that take a single vector, such as mean, but I can't get it to work with cov (or prod).

Please let me know if a dput of a few months would help.

> library("zoo")
> data <- read.zoo("100Size-BM.csv", header=TRUE, sep=",", format="%Y%m%d")
> head(data[, c("R1", "R2", "R3", "R100", "Mkt.RF")])
                 R1       R2       R3     R100  Mkt.RF
1963-07-01 -0.00212  0.00398 -0.00472 -0.00362 -0.0066
1963-07-02 -0.00242  0.00678  0.00068 -0.00012  0.0078
1963-07-03  0.00528  0.01078  0.00598  0.00338  0.0063
1963-07-05  0.01738 -0.00932 -0.00072 -0.00012  0.0040
1963-07-08  0.01048 -0.01262 -0.01332 -0.01392 -0.0062
1963-07-09 -0.01052  0.01048  0.01738  0.01388  0.0045

mean works great, and gives me the monthly data I want.

> mean.temp <- aggregate(data[, 1:100], as.yearmon, mean)
> head(mean.temp[, 1:3])
                    R1            R2            R3
Jul 1963  0.0003845455  7.545455e-05  0.0004300000
Aug 1963 -0.0006418182  2.412727e-03  0.0022263636
Sep 1963  0.0016250000  1.025000e-03 -0.0002600000
Oct 1963 -0.0007952174  2.226522e-03  0.0004873913
Nov 1963  0.0006555556 -5.211111e-03 -0.0013888889
Dec 1963 -0.0027066667 -1.249524e-03 -0.0005828571

But I can't get a function that uses two different columns/vectors to work.

> cov.temp <- aggregate(data[, 1:100], as.yearmon, cov(x, data[, "Mkt.RF"]))
Error in inherits(x, "data.frame") : object 'x' not found

Nor can I get it work making a cov wrapper.

> f <- function(x) cov(x, data[, "Mkt.RF"])
> cov.temp <- aggregate(data[, 1:100], as.yearmon, f)
Error in cov(x, data[, "Mkt.RF"]) : incompatible dimensions

Should I do this with a for loop? I am hoping there is a more R way. Thanks!

like image 519
Richard Herron Avatar asked Feb 26 '23 03:02

Richard Herron


1 Answers

You can use the approach I wrote here, namely to do something like:

tapply(1:nrow(data), data$group, function(s) cov(data$x[s], data$y[s]))
like image 63
Charles Avatar answered Mar 01 '23 15:03

Charles