Using pandas, what is the easiest way to calculate a rolling cumsum over the previous n elements, for instance to calculate trailing three days sales:
df = pandas.Series(numpy.random.randint(0,10,10), index=pandas.date_range('2020-01', periods=10))
df
2020-01-01 8
2020-01-02 4
2020-01-03 1
2020-01-04 0
2020-01-05 5
2020-01-06 8
2020-01-07 3
2020-01-08 8
2020-01-09 9
2020-01-10 0
Freq: D, dtype: int64
Desired output:
2020-01-01 8
2020-01-02 12
2020-01-03 13
2020-01-04 5
2020-01-05 6
2020-01-06 13
2020-01-07 16
2020-01-08 19
2020-01-09 20
2020-01-10 17
Freq: D, dtype: int64
The cumsum() method returns a DataFrame with the cumulative sum for each row. The cumsum() method goes through the values in the DataFrame, from the top, row by row, adding the values with the value from the previous row, ending up with a DataFrame where the last row contains the sum of all values for each column.
Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.
First, create a data frame as 'data_frame' and provide the values you need to calculate the cumulative sum, then pass the 'data_frame' parameter to pd. DataFrame() while specifying the column values, and finally, use the cumsum() and sum() built-in functions to calculate the cumulative percentage.
You need rolling.sum:
df.rolling(3, min_periods=1).sum()
Out:
2020-01-01 8.0
2020-01-02 12.0
2020-01-03 13.0
2020-01-04 5.0
2020-01-05 6.0
2020-01-06 13.0
2020-01-07 16.0
2020-01-08 19.0
2020-01-09 20.0
2020-01-10 17.0
dtype: float64
min_periods
ensures the first two elements are calculated, too. With a window size of 3, by default, the first two elements are NaN.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With