I am trying to compute rolling means of an unbalanced data set. To illustrate my point I have produced this toy example of my data:
ID year Var RollingAvg(Var)
1 2000 2 NA
1 2001 3 2
1 2002 4 2.5
1 2003 2 3
2 2001 2 NA
2 2002 5 2
2 2003 4 3.5
The column RollingAvg(Var)
is what I want, but can't get. In words, I am looking for the rolling average of ALL the previous observations of Var
for each ID
. I have tried using rollapply
and ddply
in the zoo
and the plyr
package, but I can't see how to set the rolling window length to use ALL the previous observations for each ID. Maybe I should use the plm package instead? Any help is appreciated.
I have seen other posts on rolling means on BALANCED panel data set, but I can't seem to extrapolate their answers to unbalanced data.
Thanks,
M
Using data.table
:
library(data.table)
d = data.table(your_df)
d[, RollingAvg := {avg = cumsum(Var)/seq_len(.N);
c(NA, avg[-length(avg)])},
by = ID]
(or even simplified)
d[, RollingAvg := c(NA, head(cumsum(Var)/(seq_len(.N)), -1)), by = ID]
Assuming that years are contiguous within each ID (which is case in the example data) and DF
is the input data frame, here is a solution using just base R. cumRoll
is a function that performs the required operation on one ID and ave
then performs it by ID:
cumRoll <- function(x) c(NA, head(cumsum(x) / seq_along(x), -1))
DF$Roll <- ave(DF$Var, DF$ID, FUN = cumRoll)
The result is:
> DF
ID year Var Roll
1 1 2000 2 NA
2 1 2001 3 2.0
3 1 2002 4 2.5
4 1 2003 2 3.0
5 2 2001 2 NA
6 2 2002 5 2.0
7 2 2003 4 3.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With