Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rolling means and applying means at beginning of a series of data

I want to do a rolling mean of the the previous 4 values in a dataset. However, for the beginning, since there is not 4 values, I want to do the rolling mean of 1/2/3 observations. How do I do this?

 library(zoo)
 df= data.frame(a=c(1,2,3,4,5))
 df$answer = rollapply(df$a, 4,mean)
 #help

For example, row 1 would have a value of 1, row 2 would have a value of (1+2)/2=1.5, row 3 would have a value of 6/3=2.

I want to do rolling means of 4 periods but in periods with fewer months, I want to do the mean of the maximum periods allowed.

like image 653
runningbirds Avatar asked Apr 09 '15 19:04

runningbirds


People also ask

What does rolling mean in data?

A rolling average continuously updates the average of a data set to include all the data in the set until that point. For example, the rolling average of return quantities at March 2012 would be calculated by adding the return quantities in January, February, and March, and then dividing that sum by three.

Why do we use rolling mean in time series?

So to clearly get value from the data, we use the rolling average concept to make the time series plot. The rolling average or moving average is the simple mean of the last 'n' values. It can help us in finding trends that would be otherwise hard to detect. Also, they can be used to determine long-term trends.

What is rolling mean used for?

In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating a series of averages of different subsets of the full data set. It is also called a moving mean (MM) or rolling mean and is a type of finite impulse response filter.

What is rolling () in Python?

The rolling() function is used to provide rolling window calculations. Syntax: Series.rolling(self, window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)


4 Answers

Use right aligment with partial=TRUE, i.e. rollapplyr(..., partial=TRUE) or rollapply(..., align = "right", partial=TRUE). Here we use rollapplyr:

rollapplyr(df$a, 4, mean, partial = TRUE)
like image 196
G. Grothendieck Avatar answered Nov 14 '22 21:11

G. Grothendieck


I think it can be simply done with a simple function such as the following (as an alternative solution):

rollapply2 <- function(myvec, width, fun){
  #the first values up to width
  firstvalues  <- cumsum(myvec[1:(width-1)])/(1:(width-1))
  #the rest of the values as normal
  normalvalues <- rollapply(myvec, width, fun)
  #return them all
  c(firstvalues,normalvalues)
}

Output:

> rollapply2(df$a, 4, mean)
[1] 1.0 1.5 2.0 2.5 3.5
like image 22
LyzandeR Avatar answered Nov 14 '22 22:11

LyzandeR


You can also try without package:

sapply(seq_along(df$a), function(u) mean(df$a[max(u-3,0):u]))
#[1] 1.0 1.5 2.0 2.5 3.5

Or a vectorized solution - without loop - base R:

with(df, (cumsum(a) - c(rep(0,4),head(a,-4)))/pmin(seq_along(a),4))
#[1] 1.0 1.5 2.0 2.5 3.5
like image 22
Colonel Beauvel Avatar answered Nov 14 '22 23:11

Colonel Beauvel


What about adding extra NAs ?

rollapply(c(rep(NA, 3),df$a), 4, FUN = mean, align = "right", na.rm = TRUE)
like image 40
bergant Avatar answered Nov 14 '22 22:11

bergant