region year val
1.0 2015.0 6.775457e+05
1.0 2016.0 6.819761e+05
1.0 2017.0 6.864065e+05
2.0 2015.0 6.175457e+05
2.0 2016.0 6.419761e+05
3.0 2017.0 6.564065e+05
In the dataframe above, I want to compute the percentage difference between consecutive rows but only for the same region values. I tried this but not sure if it works. What is best way to achieve it?
df.groupby(['region', 'year'])['val'].pct_change()
The pct_change() method returns a DataFrame with the percentage difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
You can use the DataFrame. diff() function to find the difference between two rows in a pandas DataFrame. where: periods: The number of previous rows for calculating the difference.
The Pandas diff method allows us to easily subtract two rows in a Pandas Dataframe. By default, Pandas will calculate the difference between subsequent rows.
To calculate a percentage in Python, use the division operator (/) to get the quotient from two numbers and then multiply this quotient by 100 using the multiplication operator (*) to get the percentage.
You can use DataFrameGroupBy.pct_change
with groupby by column region
:
df['new'] = df.groupby('region')['val'].pct_change()
print (df)
region year val new
0 1.0 2015.0 677545.7 NaN
1 1.0 2016.0 681976.1 0.006539
2 1.0 2017.0 686406.5 0.006496
3 2.0 2015.0 617545.7 NaN
4 2.0 2016.0 641976.1 0.039560
5 3.0 2017.0 656406.5 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With