Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas rolling cumsum over the trailing n elements

Tags:

pandas

cumsum

Using pandas, what is the easiest way to calculate a rolling cumsum over the previous n elements, for instance to calculate trailing three days sales:

df = pandas.Series(numpy.random.randint(0,10,10), index=pandas.date_range('2020-01', periods=10))
df
2020-01-01    8
2020-01-02    4
2020-01-03    1
2020-01-04    0
2020-01-05    5
2020-01-06    8
2020-01-07    3
2020-01-08    8
2020-01-09    9
2020-01-10    0
Freq: D, dtype: int64

Desired output:

2020-01-01     8
2020-01-02    12
2020-01-03    13
2020-01-04     5
2020-01-05     6
2020-01-06    13
2020-01-07    16
2020-01-08    19
2020-01-09    20
2020-01-10    17
Freq: D, dtype: int64
like image 805
CarlosE Avatar asked May 27 '17 21:05

CarlosE


People also ask

What does Cumsum do in pandas?

The cumsum() method returns a DataFrame with the cumulative sum for each row. The cumsum() method goes through the values in the DataFrame, from the top, row by row, adding the values with the value from the previous row, ending up with a DataFrame where the last row contains the sum of all values for each column.

How does rolling work in pandas?

Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.

How do you calculate cumulative percentage in pandas?

First, create a data frame as 'data_frame' and provide the values you need to calculate the cumulative sum, then pass the 'data_frame' parameter to pd. DataFrame() while specifying the column values, and finally, use the cumsum() and sum() built-in functions to calculate the cumulative percentage.


1 Answers

You need rolling.sum:

df.rolling(3, min_periods=1).sum()
Out: 
2020-01-01     8.0
2020-01-02    12.0
2020-01-03    13.0
2020-01-04     5.0
2020-01-05     6.0
2020-01-06    13.0
2020-01-07    16.0
2020-01-08    19.0
2020-01-09    20.0
2020-01-10    17.0
dtype: float64

min_periods ensures the first two elements are calculated, too. With a window size of 3, by default, the first two elements are NaN.

like image 129
ayhan Avatar answered Oct 28 '22 06:10

ayhan