Calculating year over year growth by group in Pandas

Tags:

I have the following dataframe:

In [1]: df
Out[1]: 

ID            Month           Transaction_Amount

1             2013/01         10
1             2013/02         20
1             2013/03         10
1             2013/04         20
1             2013/05         10
1             2013/06         20
1             2013/07         10
1             2013/08         20
1             2013/09         10
1             2013/10         20
1             2013/11         10
1             2013/12         20
1             2014/01         15
1             2014/02         25
1             2014/03         15
1             2014/04         25
...
1             2014/11         15
1             2014/12         25
...
10000000      2014/11         13
10000000      2014/12         23

What I would like to do is calculate the growth over rolling month periods year over year, so for example, I would want to find the value of (2014/01 - 2013/01) / (2014/01) which is (15 - 10) / (15) = 1/3 and save this for the first rolling period. There will be a total of 12 rolling periods for each ID. I'm thinking that the final output should look like:

In [2]: df_new
Out[2]: 

ID       rolling_period_1   rolling_period_2  ... rolling_period_12

1        .333333            .25                   .25
2        x1                 x2                    x12
3        y1                 y2                    y12
4        z1                 z2                    z12
...

I generated a list containing tuples of every year over period [(2013/01, 2014/01), (2013/02, 2014/02) ... (2013/12, 2014/12)] and have been playing around with isin to index a subset of the original df, but I am unsure how to arrive at the df_new.

EDIT

I have created a new dataframe called temp_df with the following code:

In [4]: temp_df = df[df['month'].isin(('2013/01','2014/01'))]

In [5]: temp_df
Out[5]:

ID            Month           Transaction_Amount

1             2013/01         10
1             2014/01         15
2             2013/01         20
2             2014/01         30
3             2013/01         15
3             2014/01         30
...

What I would like to produce is a DataFrame that looks like the following:

In [6]: new_df
Out[6]:

ID            Transaction_Growth

1             .3333   # (15-10)/15
2             .3333   # (30-20)/30
3             .50     # (30-15)/30
...

238

asked Feb 04 '15 18:02

invoker

1 Answers

I think there is a much more simple way to do this that doesn't require keeping track of shifting time periods, try using the df.pct_change() method:

import pandas as pd
import numpy as np
date_range = pd.period_range("2016-01", "2018-01",freq='m')
df= pd.DataFrame({'A':np.random.rand(len(date_range))}, index=date_range)
df['pct_pop'] = df['A'].pct_change()
df['pct_yoy'] = df['A'].pct_change(12)
df

               A    pct_pop     pct_yoy
2016-01 0.478381    NaN NaN
2016-02 0.941450    0.967991    NaN
2016-03 0.128445    -0.863567   NaN
2016-04 0.498623    2.882011    NaN
2016-05 0.914663    0.834377    NaN
2016-06 0.349565    -0.617821   NaN
2016-07 0.563296    0.611419    NaN
2016-08 0.144055    -0.744264   NaN
2016-09 0.502279    2.486708    NaN
2016-10 0.621283    0.236928    NaN
2016-11 0.716813    0.153763    NaN
2016-12 0.152372    -0.787431   NaN
2017-01 0.160636    0.054234    -0.664209
2017-02 0.496759    2.092453    -0.472347
2017-03 0.324318    -0.347132   1.524965
2017-04 0.431651    0.330949    -0.134315
2017-05 0.973095    1.254357    0.063884
2017-06 0.007917    -0.991864   -0.977351
2017-07 0.875365    109.562870  0.554005
2017-08 0.860987    -0.016425   4.976784
2017-09 0.099549    -0.884378   -0.801805
2017-10 0.544275    4.467398    -0.123950
2017-11 0.433326    -0.203846   -0.395482
2017-12 0.688057    0.587850    3.515636
2018-01 0.924038    0.342967    4.752374

152

answered Oct 26 '22 13:10

Jonathan

Related questions
                            
                                ImportError: No module named distutils
                            
                                How to update DjangoItem in Scrapy
                            
                                Pandas delete parts of string after specified character inside a dataframe
                            
                                Django - null value in column violates not-null constraint in Django Admin
                            
                                Generating and saving an .eml file with python 3.3
                            
                                How to pass a javascript array to a python script using flask [using flask example]
                            
                                CPython string addition optimisation failure case
                            
                                pandas dataframe with 2-rows header and export to csv
                            
                                How to run python setup.py develop command inside virtualenv using ansible
                            
                                Rename "None" value in Pandas
                            
                                Python ctypes definition for c struct
                            
                                Listing all combinations of a list up to length n (Python)
                            
                                ImportError: cannot import name choice when importing sklearn.mixture
                            
                                How to plot blurred points in Matplotlib
                            
                                Selenium Python bindings: how to execute JavaScript on an element?
                            
                                SQLAlchemy: How to Delete with join
                            
                                Set writeConcern level to unacknowledged in pymongo
                            
                                How to open this XML file to create dataframe in Python?
                            
                                How does one ignore CSRF tokens sent to Django REST Framework?
                            
                                Python logging module emits wrong timezone information

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Calculating year over year growth by group in Pandas

Tags:

python

function

indexing

pandas

invoker

People also ask

1 Answers

Jonathan

Recent Activity

Donate For Us