I'm having difficulty to solve a look-back or roll-over problem in dataframe or perhaps in groupby. The following is a simple example of the dataframe I have: <pre class="prettyprint"><code> fruit amount 20140101 apple 3 20140102 apple 5 20140102 orange 10 20140104 banana 2 20140104 apple 10 20140104 orange 4 20140105 orange 6 20140105 grape 1 … 20141231 apple 3 20141231 grape 2 </code></pre> I need to calculate the average value of 'amount' of each fruit in the previous 3 days for everyday, and create the following data frame: <pre class="prettyprint"><code> fruit average_in_last 3 days 20140104 apple 4 20140104 orange 10 ... </code></pre> For example on 20140104, the previous 3 days are 20140101, 20140102 and 20140103 (note the date in the data frame is not continuous and 20140103 does not exist), the average amount of apple is (3+5)/2 = 4 and orange is 10/1=10, the rest is 0. The sample data frame is very simple but the actual data frame is much more complicated and larger. Hope someone can shed some light on this, thank you in advance!

Assuming we have a data frame like that in the beginning, <pre class="prettyprint"><code>>>> df fruit amount 2017-06-01 apple 1 2017-06-03 apple 16 2017-06-04 apple 12 2017-06-05 apple 8 2017-06-06 apple 14 2017-06-08 apple 1 2017-06-09 apple 4 2017-06-02 orange 13 2017-06-03 orange 9 2017-06-04 orange 9 2017-06-05 orange 2 2017-06-06 orange 11 2017-06-07 orange 6 2017-06-08 orange 3 2017-06-09 orange 3 2017-06-10 orange 13 2017-06-02 grape 14 2017-06-03 grape 16 2017-06-07 grape 4 2017-06-09 grape 15 2017-06-10 grape 5 >>> dates = [i.date() for i in pd.date_range('2017-06-01', '2017-06-10')] >>> temp = (df.groupby('fruit')['amount'] .apply(lambda x: x.reindex(dates) # fill in the missing dates for each group) .fillna(0) # fill each missing group with 0 .rolling(3) .sum()) # do a rolling sum .reset_index() .rename(columns={'amount': 'sum_of_3_days', 'level_1': 'date'})) # rename date index to date col >>> temp.head() fruit date amount 0 apple 2017-06-01 NaN 1 apple 2017-06-02 NaN 2 apple 2017-06-03 17.0 3 apple 2017-06-04 28.0 4 apple 2017-06-05 36.0 # converts the date index into date column >>> df = df.reset_index().rename(columns={'index': 'date'}) >>> df.merge(temp, on=['fruit', 'date']) >>> df date fruit amount sum_of_3_days 0 2017-06-01 apple 1 NaN 1 2017-06-03 apple 16 17.0 2 2017-06-04 apple 12 28.0 3 2017-06-05 apple 8 36.0 4 2017-06-06 apple 14 34.0 5 2017-06-08 apple 1 15.0 6 2017-06-09 apple 4 5.0 7 2017-06-02 orange 13 NaN 8 2017-06-03 orange 9 22.0 9 2017-06-04 orange 9 31.0 10 2017-06-05 orange 2 20.0 11 2017-06-06 orange 11 22.0 12 2017-06-07 orange 6 19.0 13 2017-06-08 orange 3 20.0 14 2017-06-09 orange 3 12.0 15 2017-06-10 orange 13 19.0 16 2017-06-02 grape 14 NaN 17 2017-06-03 grape 16 30.0 18 2017-06-07 grape 4 4.0 19 2017-06-09 grape 15 19.0 20 2017-06-10 grape 5 20.0 </code></pre>

I also wanted to use rolling with groupby, this is why I landed on this page, but I believe that I have a workaround that is better than the previous suggestions. You could do the following: <pre class="prettyprint"><code>pivoted_df = pd.pivot_table(df, index='date', columns='fruits', values='amount') average_fruits = pivoted_df.rolling(window=3).mean().stack().reset_index() </code></pre> the <code>.stack()</code> is not necessary, but will transform your pivot table back to a regular df

How to apply rolling functions in a group by object in pandas

Tags:

python

pandas

dataframe

group-by

apply

I'm having difficulty to solve a look-back or roll-over problem in dataframe or perhaps in groupby.

The following is a simple example of the dataframe I have:

              fruit    amount    
   20140101   apple     3
   20140102   apple     5
   20140102   orange    10
   20140104   banana    2
   20140104   apple     10
   20140104   orange    4
   20140105   orange    6
   20140105   grape     1
   …
   20141231   apple     3
   20141231   grape     2

I need to calculate the average value of 'amount' of each fruit in the previous 3 days for everyday, and create the following data frame:

              fruit     average_in_last 3 days
   20140104   apple      4
   20140104   orange     10
   ...

For example on 20140104, the previous 3 days are 20140101, 20140102 and 20140103 (note the date in the data frame is not continuous and 20140103 does not exist), the average amount of apple is (3+5)/2 = 4 and orange is 10/1=10, the rest is 0.

The sample data frame is very simple but the actual data frame is much more complicated and larger. Hope someone can shed some light on this, thank you in advance!

836

asked Feb 21 '15 05:02

user6396

2 Answers

Assuming we have a data frame like that in the beginning,

>>> df
             fruit  amount
2017-06-01   apple       1
2017-06-03   apple      16
2017-06-04   apple      12
2017-06-05   apple       8
2017-06-06   apple      14
2017-06-08   apple       1
2017-06-09   apple       4
2017-06-02  orange      13
2017-06-03  orange       9
2017-06-04  orange       9
2017-06-05  orange       2
2017-06-06  orange      11
2017-06-07  orange       6
2017-06-08  orange       3
2017-06-09  orange       3
2017-06-10  orange      13
2017-06-02   grape      14
2017-06-03   grape      16
2017-06-07   grape       4
2017-06-09   grape      15
2017-06-10   grape       5

>>> dates = [i.date() for i in pd.date_range('2017-06-01', '2017-06-10')]

>>> temp = (df.groupby('fruit')['amount']
    .apply(lambda x: x.reindex(dates)  # fill in the missing dates for each group)
                      .fillna(0)   # fill each missing group with 0
                      .rolling(3)
                      .sum()) # do a rolling sum
    .reset_index()
    .rename(columns={'amount': 'sum_of_3_days', 
                     'level_1': 'date'}))  # rename date index to date col


>>> temp.head()
   fruit        date  amount
0  apple  2017-06-01     NaN
1  apple  2017-06-02     NaN
2  apple  2017-06-03    17.0
3  apple  2017-06-04    28.0
4  apple  2017-06-05    36.0

# converts the date index into date column 
>>> df = df.reset_index().rename(columns={'index': 'date'})  
>>> df.merge(temp, on=['fruit', 'date'])
>>> df
          date   fruit  amount  sum_of_3_days
0   2017-06-01   apple       1                NaN
1   2017-06-03   apple      16               17.0
2   2017-06-04   apple      12               28.0
3   2017-06-05   apple       8               36.0
4   2017-06-06   apple      14               34.0
5   2017-06-08   apple       1               15.0
6   2017-06-09   apple       4                5.0
7   2017-06-02  orange      13                NaN
8   2017-06-03  orange       9               22.0
9   2017-06-04  orange       9               31.0
10  2017-06-05  orange       2               20.0
11  2017-06-06  orange      11               22.0
12  2017-06-07  orange       6               19.0
13  2017-06-08  orange       3               20.0
14  2017-06-09  orange       3               12.0
15  2017-06-10  orange      13               19.0
16  2017-06-02   grape      14                NaN
17  2017-06-03   grape      16               30.0
18  2017-06-07   grape       4                4.0
19  2017-06-09   grape      15               19.0
20  2017-06-10   grape       5               20.0

170

answered Sep 27 '22 19:09

dbokers

I also wanted to use rolling with groupby, this is why I landed on this page, but I believe that I have a workaround that is better than the previous suggestions.

You could do the following:

pivoted_df = pd.pivot_table(df, index='date', columns='fruits', values='amount')
average_fruits = pivoted_df.rolling(window=3).mean().stack().reset_index()

the .stack() is not necessary, but will transform your pivot table back to a regular df

answered Sep 27 '22 17:09

Gustavo Linari Rodrigues

Related questions
                            
                                Python inverse function of id(...) built-in function
                            
                                List Comprehensions in Python to compute minimum and maximum values of a list
                            
                                Do text / number input fields exist in matplotlib?
                            
                                How to use python-magic 5.19-1
                            
                                Remove negative sign in string format in python
                            
                                How to get the public ip of current ec2 instance in python?
                            
                                Python: replace line in file starting with a pattern
                            
                                Python pandas - particular merge/replacement
                            
                                Tarfile in Python: Can I untar more efficiently by extracting only some of the data?
                            
                                Programmatically check if domains are DNSSEC protected
                            
                                Should I put the shebang line in every python file?
                            
                                Pandas: expand index of a series so it contains all values in a range
                            
                                version conflict for package "Tk": have 8.5.2, need exactly 8.5.15
                            
                                Configuring Gunicorn: No application module specified
                            
                                Embedding key as string in Paramiko application
                            
                                Redirect output of python/ipython interactive prompt commands to files or variables
                            
                                Recursive XML parsing python using ElementTree
                            
                                SQLAlchemy filter according to nested keys in JSONB
                            
                                How to use inline regex modifier in python [duplicate]
                            
                                How to continue to the next loop iteration in Python PDB?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With