Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas monthly rolling operation

Tags:

python

pandas

I ended up figuring it out while writing out this question so I'll just post anyway and answer my own question in case someone else needs a little help.

Problem

Suppose we have a DataFrame, df, containing this data.

import pandas as pd from io import StringIO  data = StringIO( """\ date          spendings  category 2014-03-25    10         A 2014-04-05    20         A 2014-04-15    10         A 2014-04-25    10         B 2014-05-05    10         B 2014-05-15    10         A 2014-05-25    10         A """ )  df = pd.read_csv(data,sep="\s+",parse_dates=True,index_col="date") 

Goal

For each row, sum the spendings over every row that is within one month of it, ideally using DataFrame.rolling as it's a very clean syntax.

What I have tried

df = df.rolling("M").sum() 

But this throws an exception

ValueError: <MonthEnd> is a non-fixed frequency 

version: pandas==0.19.2

like image 204
Filip Kilibarda Avatar asked Apr 22 '17 07:04

Filip Kilibarda


People also ask

How does rolling work in pandas?

Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.

How do you calculate rolling average in pandas?

In Python, we can calculate the moving average using . rolling() method. This method provides rolling windows over the data, and we can use the mean function over these windows to calculate moving averages. The size of the window is passed as a parameter in the function .

How do you roll a DataFrame?

rolling() function provides the feature of rolling window calculations. The concept of rolling window calculation is most primarily used in signal processing and time-series data. In very simple words we take a window size of k at a time and perform some desired mathematical operation on it.


1 Answers

Use the "D" offset rather than "M" and specifically use "30D" for 30 days or approximately one month.

df = df.rolling("30D").sum() 

Initially, I intuitively jumped to using "M" as I figured it stands for one month, but now it's clear why that doesn't work.

like image 128
Filip Kilibarda Avatar answered Sep 24 '22 22:09

Filip Kilibarda