Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get mean of last N weekdays for pandas dataframe

Assume my data is daily counts and has as its index a DateTimeIndex column. Is there a way to get the average of the past n weekdays? For instance, if the date is Sunday August 15th, I'd like to get mean of counts on (sunday august 8th, sunday august 1st, ...).

I started using pandas yesterday, so here's what I've brute forced.

# df is a dataframe with an DateTimeIndex
# brute force for count last n weekdays, wherelnwd = last n weekdays
def lnwd(n=1):
    lnwd, tmp = df.shift(7), df.shift(7) # count last weekday
    for i in xrange(n-1):
        tmp = tmp.shift(7)
        lnwd += tmp
    lnwd = lnwd/n  # average
    return lnwd

There has to be a one liner? Is there a way to use apply() (without passing a function that has a for loop? since n is variable) or some form of groupby? For instance, the way to find the mean of all data on each weekday is:

df.groupby(lambda x: x.dayofweek).mean() # mean of each MTWHFSS
like image 457
ehacinom Avatar asked Sep 05 '14 22:09

ehacinom


2 Answers

I think you are looking for a rolling apply (rolling mean in this case)? See the docs: http://pandas.pydata.org/pandas-docs/stable/computation.html#moving-rolling-statistics-moments. But then applied for each weekday seperately, this can be achieved by combining rolling_mean with grouping on the weekday with groupby.

This should give somethin like (with a series s):

s.groupby(s.index.weekday).transform(lambda x: pd.rolling_mean(x, window=n))
like image 151
joris Avatar answered Oct 13 '22 21:10

joris


Using Pandas Version 1.4.1 the solution provided by joris seems outdated ("module 'pandas' has no attribute 'rolling_mean'"). The same could be achieved using

s.groupby(s.index.weekday).transform(lambda x: pd.Series.rolling(x, window=n).mean())
like image 1
Irene Palnau Avatar answered Oct 13 '22 21:10

Irene Palnau