Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby and calculate percentage change

Tags:

python

pandas

I take reference from How to create rolling percentage for groupby DataFrame

import pandas as pd

data = [
    ('product_a','1/31/2014',53)
    ,('product_b','1/31/2014',44)
    ,('product_c','1/31/2014',36)
    ,('product_a','11/30/2013',52)
    ,('product_b','11/30/2013',43)
    ,('product_c','11/30/2013',35)
    ,('product_a','3/31/2014',50)
    ,('product_b','3/31/2014',41)
    ,('product_c','3/31/2014',34)
    ,('product_a','12/31/2013',50)
    ,('product_b','12/31/2013',41)
    ,('product_c','12/31/2013',34)
    ,('product_a','2/28/2014',52)
    ,('product_b','2/28/2014',43)
    ,('product_c','2/28/2014',35)]

product_df = pd.DataFrame( data, columns=['prod_desc','activity_month','prod_count'] )
product_df.sort_values('activity_month', inplace = True, ascending=False) 
product_df['pct_ch'] = product_df.groupby('prod_desc')['prod_count'].pct_change() + 1

print(product_df)

however, I am not able to produce the output like the suggested answer.

the answer produced

    prod_desc activity_month  prod_count    pct_ch
0   product_a      1/31/2014          53       NaN
1   product_b      1/31/2014          44  0.830189
2   product_c      1/31/2014          36  0.818182
3   product_a     11/30/2013          52  1.444444
4   product_b     11/30/2013          43  0.826923
5   product_c     11/30/2013          35  0.813953
9   product_a     12/31/2013          50  1.428571
10  product_b     12/31/2013          41  0.820000
11  product_c     12/31/2013          34  0.829268
12  product_a      2/28/2014          52  1.529412
13  product_b      2/28/2014          43  0.826923
14  product_c      2/28/2014          35  0.813953
6   product_a      3/31/2014          50  1.428571
7   product_b      3/31/2014          41  0.820000
8   product_c      3/31/2014          34  0.829268

Expected answer should be similar to below, percentage change should be calculated for every prod_desc (product_a, product_b and product_c) instead of one column only

 product_desc activity_month  prod_count    pct_ch
0    product_a     2014-01-01          53       NaN
3    product_a     2014-02-01          26  0.490566
6    product_a     2014-03-01          41  1.576923
1    product_b     2014-01-01          42       NaN
4    product_b     2014-02-01          48  1.142857
7    product_b     2014-03-01          35  0.729167
2    product_c     2014-01-01          38       NaN
5    product_c     2014-02-01          39  1.026316
8    product_c     2014-03-01          50  1.282051

Thank you in advance

like image 476
Platalea Minor Avatar asked Jan 23 '19 07:01

Platalea Minor


People also ask

How do you calculate percentage in Groupby pandas?

You can calculate the percentage of total with the groupby of pandas DataFrame by using DataFrame. groupby() , DataFrame. agg() , DataFrame. transform() methods and DataFrame.

How do you calculate percentage change in pandas?

Pandas DataFrame pct_change() MethodThe pct_change() method returns a DataFrame with the percentage difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.

How do you calculate percent change in Python?

The pct_change() function is used to get percentage change between the current and a prior element. Computes the percentage change from the immediately previous row by default. This is useful in comparing the percentage of change in a time series of elements.

How do I calculate percentage between two columns in pandas?

A Percentage is calculated by the mathematical formula of dividing the value by the sum of all the values and then multiplying the sum by 100. This is also applicable in Pandas Dataframes.


1 Answers

Use GroupBy.apply with Series.pct_change:

product_df['activity_month'] = pd.to_datetime(product_df['activity_month'])
product_df.sort_values(['prod_desc','activity_month'], inplace = True, ascending=[True, False])

product_df['pct_ch'] = (product_df.groupby('prod_desc')['prod_count']
                                  .apply(pd.Series.pct_change) + 1)
print(product_df)
    prod_desc activity_month  prod_count    pct_ch
6   product_a     2014-03-31          50       NaN
12  product_a     2014-02-28          52  1.040000
0   product_a     2014-01-31          53  1.019231
9   product_a     2013-12-31          50  0.943396
3   product_a     2013-11-30          52  1.040000
7   product_b     2014-03-31          41       NaN
13  product_b     2014-02-28          43  1.048780
1   product_b     2014-01-31          44  1.023256
10  product_b     2013-12-31          41  0.931818
4   product_b     2013-11-30          43  1.048780
8   product_c     2014-03-31          34       NaN
14  product_c     2014-02-28          35  1.029412
2   product_c     2014-01-31          36  1.028571
11  product_c     2013-12-31          34  0.944444
5   product_c     2013-11-30          35  1.029412
like image 139
jezrael Avatar answered Sep 22 '22 08:09

jezrael