Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating year over year growth by group in Pandas

I have the following dataframe:

In [1]: df
Out[1]: 

ID            Month           Transaction_Amount

1             2013/01         10
1             2013/02         20
1             2013/03         10
1             2013/04         20
1             2013/05         10
1             2013/06         20
1             2013/07         10
1             2013/08         20
1             2013/09         10
1             2013/10         20
1             2013/11         10
1             2013/12         20
1             2014/01         15
1             2014/02         25
1             2014/03         15
1             2014/04         25
...
1             2014/11         15
1             2014/12         25
...
10000000      2014/11         13
10000000      2014/12         23

What I would like to do is calculate the growth over rolling month periods year over year, so for example, I would want to find the value of (2014/01 - 2013/01) / (2014/01) which is (15 - 10) / (15) = 1/3 and save this for the first rolling period. There will be a total of 12 rolling periods for each ID. I'm thinking that the final output should look like:

In [2]: df_new
Out[2]: 

ID       rolling_period_1   rolling_period_2  ... rolling_period_12

1        .333333            .25                   .25
2        x1                 x2                    x12
3        y1                 y2                    y12
4        z1                 z2                    z12
...

I generated a list containing tuples of every year over period [(2013/01, 2014/01), (2013/02, 2014/02) ... (2013/12, 2014/12)] and have been playing around with isin to index a subset of the original df, but I am unsure how to arrive at the df_new.

EDIT

I have created a new dataframe called temp_df with the following code:

In [4]: temp_df = df[df['month'].isin(('2013/01','2014/01'))]

In [5]: temp_df
Out[5]:

ID            Month           Transaction_Amount

1             2013/01         10
1             2014/01         15
2             2013/01         20
2             2014/01         30
3             2013/01         15
3             2014/01         30
...

What I would like to produce is a DataFrame that looks like the following:

In [6]: new_df
Out[6]:

ID            Transaction_Growth

1             .3333   # (15-10)/15
2             .3333   # (30-20)/30
3             .50     # (30-15)/30
...
like image 238
invoker Avatar asked Feb 04 '15 18:02

invoker


People also ask

How to calculate the growth over rolling month periods?

What I would like to do is calculate the growth over rolling month periods year over year, so for example, I would want to find the value of (2014/01 - 2013/01) / (2014/01)which is (15 - 10) / (15) = 1/3and save this for the first rolling period. There will be a total of 12 rolling periods for each ID.

How to aggregate data in pandas?

Pandas also comes with an additional method, .agg (), which allows us to apply multiple aggregations in the .groupby () method. The method allows us to pass in a list of callables (i.e., the function part without the parentheses). Let’s see how we can apply some of the functions that come with the numpy library to aggregate our data.

How to group by date and time using PANDAS grouper class?

Below are some examples that depict how to group by a dataframe on the basis of date and time using pandas Grouper class. In the above example, the dataframe is groupby by the Date column. As we have provided freq = ‘M’ which means month, so the data is grouped month-wise till the last date of every month and provided sum of price column.

How do I Group by a column in pandas?

Pandas GroupBy allows us to specify a groupby instruction for an object. This specified instruction will select a column via the key parameter of the grouper function along with the level and/or axis parameters if given, a level of the index of the target object/column.


1 Answers

I think there is a much more simple way to do this that doesn't require keeping track of shifting time periods, try using the df.pct_change() method:

import pandas as pd
import numpy as np
date_range = pd.period_range("2016-01", "2018-01",freq='m')
df= pd.DataFrame({'A':np.random.rand(len(date_range))}, index=date_range)
df['pct_pop'] = df['A'].pct_change()
df['pct_yoy'] = df['A'].pct_change(12)
df

               A    pct_pop     pct_yoy
2016-01 0.478381    NaN NaN
2016-02 0.941450    0.967991    NaN
2016-03 0.128445    -0.863567   NaN
2016-04 0.498623    2.882011    NaN
2016-05 0.914663    0.834377    NaN
2016-06 0.349565    -0.617821   NaN
2016-07 0.563296    0.611419    NaN
2016-08 0.144055    -0.744264   NaN
2016-09 0.502279    2.486708    NaN
2016-10 0.621283    0.236928    NaN
2016-11 0.716813    0.153763    NaN
2016-12 0.152372    -0.787431   NaN
2017-01 0.160636    0.054234    -0.664209
2017-02 0.496759    2.092453    -0.472347
2017-03 0.324318    -0.347132   1.524965
2017-04 0.431651    0.330949    -0.134315
2017-05 0.973095    1.254357    0.063884
2017-06 0.007917    -0.991864   -0.977351
2017-07 0.875365    109.562870  0.554005
2017-08 0.860987    -0.016425   4.976784
2017-09 0.099549    -0.884378   -0.801805
2017-10 0.544275    4.467398    -0.123950
2017-11 0.433326    -0.203846   -0.395482
2017-12 0.688057    0.587850    3.515636
2018-01 0.924038    0.342967    4.752374
like image 152
Jonathan Avatar answered Oct 26 '22 13:10

Jonathan