Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating a moving sum variable in R

I suspect this is a somewhat simple question with multiple solutions, but I'm still a bit of a novice in R and an exhaustive search didn't yield answers that spoke well to what I'm wanting to do.

I'm trying to create, for lack of better term, "moving sums" for a variable in my data frame. These would be 3-year and 5-year sums, lagged one year. So, a 5-year sum for an observation in 1986 would be the sum of all previous observations in 1981, 1982, 1983, 1984, and 1985. Here is an example of what I would like to do, where the sum variable is the sum of all x in the five years prior to the observation year.

country     year      x      x5yrsum
  A         1980      9        NA
  A         1981      3        NA
  A         1982      5        NA
  A         1983      6        NA
  A         1984      9        NA
  A         1985      7        32
  A         1986      9        30
  A         1987      4        36

  .....................

  B         1990      0        NA
  B         1991      4        NA
  B         1992      2        NA
  B         1993      6        NA
  B         1994      3        NA
  B         1995      7        15
  B         1996      0        22

This is unbalanced panel data. I suspect ddply would be appropriate, but I wouldn't know the exact coding for it.

Any input would be appreciated.

like image 271
steve Avatar asked Jul 10 '13 14:07

steve


2 Answers

You can use filter in ddply (or any other function implementing the "split-apply-combine" approach):

library(plyr)
ddply(DF, .(country), transform, 
          x5yrsum2 = as.numeric(filter(x,c(0,rep(1,5)),sides=1)))

#    country year x x5yrsum x5yrsum2
# 1        A 1980 9      NA       NA
# 2        A 1981 3      NA       NA
# 3        A 1982 5      NA       NA
# 4        A 1983 6      NA       NA
# 5        A 1984 9      NA       NA
# 6        A 1985 7      32       32
# 7        A 1986 9      30       30
# 8        A 1987 4      36       36
# 9        B 1990 0      NA       NA
# 10       B 1991 4      NA       NA
# 11       B 1992 2      NA       NA
# 12       B 1993 6      NA       NA
# 13       B 1994 3      NA       NA
# 14       B 1995 7      15       15
# 15       B 1996 0      22       22
like image 180
Roland Avatar answered Oct 16 '22 19:10

Roland


If DF is the input three-column data frame then use ave with rollapplyr from zoo. Note that we use a width of k+1 and then drop the k+1st element from the sum so that the current value of x is excluded and only the remaining k values are summed:

library(zoo)

k <- 5
roll <- function(x) rollapplyr(x, k+1, function(x) sum(x[-k-1]), fill = NA)
transform(DF, xSyrsum = ave(x, country, FUN = roll))

which gives:

   country year x xSyrsum
1        A 1980 9      NA
2        A 1981 3      NA
3        A 1982 5      NA
4        A 1983 6      NA
5        A 1984 9      NA
6        A 1985 7      32
7        A 1986 9      30
8        A 1987 4      36
9        B 1990 0      NA
10       B 1991 4      NA
11       B 1992 2      NA
12       B 1993 6      NA
13       B 1994 3      NA
14       B 1995 7      15
15       B 1996 0      22
like image 40
G. Grothendieck Avatar answered Oct 16 '22 19:10

G. Grothendieck