Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

applying rolling mean by group in R

I'm an R newbie and I'm having a lot of trouble doing something that is probably very simple. I have a big dataset split up into groups by country code, and I want to take a 3-month rolling average of a price index, by country, and then put it into a new column that matches up to the appropriate month. I've been trying to use rollmean like this with no success (code and error messages below):

> leader$last3<-tapply(leader, leader$ccode, 
    function(x) rollmean(leader$GI_delta, 3, na.pad=T))
Error in tapply(leader, leader$ccode, function(x) rollmean(leader$GI_delta,  : 
  arguments must have same length

> leader$last3<-ddply(leader, .(ccode), 
    rollmean(GI_delta, 3, na.pad=T))

Error in llply(.data = .data, .fun = .fun, ..., .progress = .progress,  : 
  .fun is not a function.

Any help would be much appreciated!

like image 822
Steve Palley Avatar asked Mar 10 '12 06:03

Steve Palley


2 Answers

If you want to make a new column, then try using ave. It resembles tapply but returns a vector of the same length as its first argument. My experience is that it is a lot faster than ddply:

require(zoo)
leader$last3<-ave(leader$GI_delta, leader$ccode, 
                         FUN= function(x) rollmean(x, k=3, na.pad=T) )
like image 65
IRTFM Avatar answered Sep 21 '22 22:09

IRTFM


In your first attempt, your function does not use its x argument, and always returns the same thing (a vector with the wrong size). In addition, the first argument, should be a vector. Lastly, tapply returns a list of vectors: you cannot put the result directly into a data.frame.

library(zoo)
n <- 10
leader <- data.frame(
  ccode = rep(LETTERS[1:3],each=n),
  GI_delta = rnorm(3*n)
)
tapply(
  leader$GI_delta, 
  leader$ccode, 
  function(x) rollmean(x, 3, na.pad=TRUE)
)

In your second example, the third argument of plyr should be a function, not an expression. If you want to use an expression, you can use summarize or transform as a function (summarize returns a 1-row data.frame for each value of ccode, while transform keeps the number of rows unchanged), and put the expressions as further arguments.

library(plyr)
ddply(
  leader, "ccode",
  transform,
  last3 = rollmean( GI_delta, 3, align="right", na.pad=TRUE )
)
like image 32
Vincent Zoonekynd Avatar answered Sep 24 '22 22:09

Vincent Zoonekynd