Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sum python pandas dataframe in certain time range

Tags:

python

pandas

sum

I have a dataframe like this

df

    order_date    amount
0   2015-10-02      1
1   2015-12-21      15
2   2015-12-24      3
3   2015-12-26      4
4   2015-12-27      5
5   2015-12-28      10

I would like to sum on df["amount"] based on range from df["order_date"] to df["order_date"] + 6 days

    order_date    amount   sum
0   2015-10-02      1       1 
1   2015-12-21      15      27  //comes from 15 + 3 + 4 + 5
2   2015-12-24      3       22  //comes from 3 + 4 + 5 + 10
3   2015-12-26      4       19
4   2015-12-27      5       15
5   2015-12-28      10      10

the data type of order_date is datetime have tried to use iloc but it did not work well... if anyone has any idea/example on who to work on this, please kindly let me know.

like image 325
Leigh Love Avatar asked May 24 '17 15:05

Leigh Love


People also ask

How do I sum specific rows in pandas?

To sum only specific rows, use the loc() method. Mention the beginning and end row index using the : operator. Using loc(), you can also set the columns to be included. We can display the result in a new column.

How do I sum a range of columns in pandas?

By using DataFrame. loc[] function, select the columns by labels and then use sum(axis=1) function to calculate the total sum of columns. Using this you can also specify the rows you wanted to get the sum value.

How do I calculate the difference between two dates and time in pandas?

Example 1: We will take a dataframe and have two columns for the dates between which we want to get the difference. Use df. dates1-df. dates2 to find the difference between the two dates and then convert the result in the form of months.


1 Answers

If pandas rolling allowed left-aligned window (default is right-aligned) then the answer would be a simple single liner: df.set_index('order_date').amount.rolling('7d',min_periods=1,align='left').sum(), however forward-looking has not been implemented yet (i.e. rolling does not accept an align parameter). So, the trick I came up with is to "reverse" the dates temporarily. Solution:

df.index = pd.to_datetime(pd.datetime.now() - df.order_date)
df['sum'] = df.sort_index().amount.rolling('7d',min_periods=1).sum()
df.reset_index(drop=True)

Output:

  order_date  amount   sum
0 2015-10-02       1   1.0
1 2015-12-21      15  27.0
2 2015-12-24       3  22.0
3 2015-12-26       4  19.0
4 2015-12-27       5  15.0
5 2015-12-28      10  10.0
like image 65
tozCSS Avatar answered Sep 20 '22 23:09

tozCSS