I ended up figuring it out while writing out this question so I'll just post anyway and answer my own question in case someone else needs a little help.
Suppose we have a DataFrame
, df
, containing this data.
import pandas as pd from io import StringIO data = StringIO( """\ date spendings category 2014-03-25 10 A 2014-04-05 20 A 2014-04-15 10 A 2014-04-25 10 B 2014-05-05 10 B 2014-05-15 10 A 2014-05-25 10 A """ ) df = pd.read_csv(data,sep="\s+",parse_dates=True,index_col="date")
For each row, sum the spendings
over every row that is within one month of it, ideally using DataFrame.rolling
as it's a very clean syntax.
df = df.rolling("M").sum()
But this throws an exception
ValueError: <MonthEnd> is a non-fixed frequency
version: pandas==0.19.2
Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.
In Python, we can calculate the moving average using . rolling() method. This method provides rolling windows over the data, and we can use the mean function over these windows to calculate moving averages. The size of the window is passed as a parameter in the function .
rolling() function provides the feature of rolling window calculations. The concept of rolling window calculation is most primarily used in signal processing and time-series data. In very simple words we take a window size of k at a time and perform some desired mathematical operation on it.
Use the "D"
offset rather than "M"
and specifically use "30D"
for 30 days or approximately one month.
df = df.rolling("30D").sum()
Initially, I intuitively jumped to using "M"
as I figured it stands for one month, but now it's clear why that doesn't work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With