For each Name
in the following dataframe I'm trying to find the percentage change from one Time
to the next of the Amount
column:
Code to create the dataframe:
import pandas as pd
df = pd.DataFrame({'Name': ['Ali', 'Ali', 'Ali', 'Cala', 'Cala', 'Cala', 'Elena', 'Elena', 'Elena'],
'Time': [1, 2, 3, 1, 2, 3, 1, 2, 3],
'Amount': [24, 52, 34, 95, 98, 54, 32, 20, 16]})
df.sort_values(['Name', 'Time'], inplace = True)
The first approach I tried (based on this question and answer) used groupby
and pct_change
:
df['pct_change'] = df.groupby(['Name'])['Amount'].pct_change()
With the result:
This doesn't seem to be grouping by the name because it is the same result as if I had used no groupby
and called df['Amount'].pct_change()
. According to the Pandas Documentation for pandas.core.groupby.DataFrameGroupBy.pct_change
, the above approach should work to calculate the percentage change of each value to the previous value within a group.
For a second approach I used groupby
with apply
and pct_change
:
df['pct_change_with_apply'] = df.groupby('Name')['Amount'].apply(lambda x: x.pct_change())
With the result:
This time all the percentage changes are correct.
Why does the groupby
and pct_change
approach not return the correct values, but using groupby
with apply
does?
Edit January 28, 2018: This behavior has been corrected in the latest version of Pandas, 0.24.0. To install run pip install -U pandas
.
As already noted by @piRSquared in the comments; this is due to a bug filed on Github under issue #21621. It already looks to be solved in milestone 0.24.0
(due 2018-12-31). My version (0.23.4
) still displayed this bugged behaviour.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With