I take reference from How to create rolling percentage for groupby DataFrame
import pandas as pd
data = [
('product_a','1/31/2014',53)
,('product_b','1/31/2014',44)
,('product_c','1/31/2014',36)
,('product_a','11/30/2013',52)
,('product_b','11/30/2013',43)
,('product_c','11/30/2013',35)
,('product_a','3/31/2014',50)
,('product_b','3/31/2014',41)
,('product_c','3/31/2014',34)
,('product_a','12/31/2013',50)
,('product_b','12/31/2013',41)
,('product_c','12/31/2013',34)
,('product_a','2/28/2014',52)
,('product_b','2/28/2014',43)
,('product_c','2/28/2014',35)]
product_df = pd.DataFrame( data, columns=['prod_desc','activity_month','prod_count'] )
product_df.sort_values('activity_month', inplace = True, ascending=False)
product_df['pct_ch'] = product_df.groupby('prod_desc')['prod_count'].pct_change() + 1
print(product_df)
however, I am not able to produce the output like the suggested answer.
the answer produced
prod_desc activity_month prod_count pct_ch
0 product_a 1/31/2014 53 NaN
1 product_b 1/31/2014 44 0.830189
2 product_c 1/31/2014 36 0.818182
3 product_a 11/30/2013 52 1.444444
4 product_b 11/30/2013 43 0.826923
5 product_c 11/30/2013 35 0.813953
9 product_a 12/31/2013 50 1.428571
10 product_b 12/31/2013 41 0.820000
11 product_c 12/31/2013 34 0.829268
12 product_a 2/28/2014 52 1.529412
13 product_b 2/28/2014 43 0.826923
14 product_c 2/28/2014 35 0.813953
6 product_a 3/31/2014 50 1.428571
7 product_b 3/31/2014 41 0.820000
8 product_c 3/31/2014 34 0.829268
Expected answer should be similar to below, percentage change should be calculated for every prod_desc (product_a, product_b and product_c) instead of one column only
product_desc activity_month prod_count pct_ch
0 product_a 2014-01-01 53 NaN
3 product_a 2014-02-01 26 0.490566
6 product_a 2014-03-01 41 1.576923
1 product_b 2014-01-01 42 NaN
4 product_b 2014-02-01 48 1.142857
7 product_b 2014-03-01 35 0.729167
2 product_c 2014-01-01 38 NaN
5 product_c 2014-02-01 39 1.026316
8 product_c 2014-03-01 50 1.282051
Thank you in advance
You can calculate the percentage of total with the groupby of pandas DataFrame by using DataFrame. groupby() , DataFrame. agg() , DataFrame. transform() methods and DataFrame.
Pandas DataFrame pct_change() MethodThe pct_change() method returns a DataFrame with the percentage difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
The pct_change() function is used to get percentage change between the current and a prior element. Computes the percentage change from the immediately previous row by default. This is useful in comparing the percentage of change in a time series of elements.
A Percentage is calculated by the mathematical formula of dividing the value by the sum of all the values and then multiplying the sum by 100. This is also applicable in Pandas Dataframes.
Use GroupBy.apply
with Series.pct_change
:
product_df['activity_month'] = pd.to_datetime(product_df['activity_month'])
product_df.sort_values(['prod_desc','activity_month'], inplace = True, ascending=[True, False])
product_df['pct_ch'] = (product_df.groupby('prod_desc')['prod_count']
.apply(pd.Series.pct_change) + 1)
print(product_df)
prod_desc activity_month prod_count pct_ch
6 product_a 2014-03-31 50 NaN
12 product_a 2014-02-28 52 1.040000
0 product_a 2014-01-31 53 1.019231
9 product_a 2013-12-31 50 0.943396
3 product_a 2013-11-30 52 1.040000
7 product_b 2014-03-31 41 NaN
13 product_b 2014-02-28 43 1.048780
1 product_b 2014-01-31 44 1.023256
10 product_b 2013-12-31 41 0.931818
4 product_b 2013-11-30 43 1.048780
8 product_c 2014-03-31 34 NaN
14 product_c 2014-02-28 35 1.029412
2 product_c 2014-01-31 36 1.028571
11 product_c 2013-12-31 34 0.944444
5 product_c 2013-11-30 35 1.029412
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With