Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulative sums, moving averages, and SQL "group by" equivalents in R

What's the most efficient way to create a moving average or rolling sum in R? How do you do the rolling function along with a "group by"?

like image 719
medriscoll Avatar asked Jul 23 '09 03:07

medriscoll


2 Answers

While zoo is great, sometimes there are simpler ways. If you data behaves nicely, and is evenly spaced, the embed() function effectively lets you create multiple lagged version of a time series. If you look inside the VARS package for vector auto-regression, you will see that the package author chooses this route.

For example, to calculate the 3 period rolling average of x, where x = (1 -> 20)^2:

> x <- (1:20)^2
> embed (x, 3)
      [,1] [,2] [,3]
 [1,]    9    4    1
 [2,]   16    9    4
 [3,]   25   16    9
 [4,]   36   25   16
 [5,]   49   36   25
 [6,]   64   49   36
 [7,]   81   64   49
 [8,]  100   81   64
 [9,]  121  100   81
[10,]  144  121  100
[11,]  169  144  121
[12,]  196  169  144
[13,]  225  196  169
[14,]  256  225  196
[15,]  289  256  225
[16,]  324  289  256
[17,]  361  324  289
[18,]  400  361  324
> apply (embed (x, 3), 1, mean)
 [1]   4.666667   9.666667  16.666667  25.666667  36.666667  49.666667
 [7]  64.666667  81.666667 100.666667 121.666667 144.666667 169.666667
[13] 196.666667 225.666667 256.666667 289.666667 324.666667 361.666667
like image 114
Josh Reich Avatar answered Sep 28 '22 14:09

Josh Reich


I scratched up a good answer from Achim Zeileis over on the r list. Here's what he said:

library(zoo)
## create data

x <- rnorm(365)
## transform to regular zoo series with "Date" index

x <- zooreg(x, start = as.Date("2004-01-01")) plot(x)

## add rolling/running/moving average with window size 7 

lines(rollmean(x, 7), col = 2, lwd = 2)

## if you don't want the rolling mean but rather a weekly ## time series of means you can do
nextfri <- function(x) 7 * ceiling(as.numeric(x - 1)/7) + as.Date(1) xw <- aggregate(x, nextfri, mean)

## nextfri is a function which computes for a certain "Date" ## the next friday. xw is then the weekly series. 

lines(xw, col = 4)

Achim went on to say:

Note, that the difference between is rolling mean and the aggregated series is due to different alignments. This can be changed by changing the 'align' argument in rollmean() or the nextfri() function in the aggregate call.

All this came from Achim, not from me: http://tolstoy.newcastle.edu.au/R/help/05/06/6785.html

like image 35
JD Long Avatar answered Sep 28 '22 16:09

JD Long