Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas 0.23 groupby and pct change not returning expected value

For each Name in the following dataframe I'm trying to find the percentage change from one Time to the next of the Amount column:

enter image description here

Code to create the dataframe:

import pandas as pd

df = pd.DataFrame({'Name': ['Ali', 'Ali', 'Ali', 'Cala', 'Cala', 'Cala', 'Elena', 'Elena', 'Elena'],
                   'Time': [1, 2, 3, 1, 2, 3, 1, 2, 3],
                   'Amount': [24, 52, 34, 95, 98, 54, 32, 20, 16]})

df.sort_values(['Name', 'Time'], inplace = True)

The first approach I tried (based on this question and answer) used groupby and pct_change:

df['pct_change'] = df.groupby(['Name'])['Amount'].pct_change()

With the result:

enter image description here

This doesn't seem to be grouping by the name because it is the same result as if I had used no groupby and called df['Amount'].pct_change(). According to the Pandas Documentation for pandas.core.groupby.DataFrameGroupBy.pct_change, the above approach should work to calculate the percentage change of each value to the previous value within a group.

For a second approach I used groupby with apply and pct_change:

df['pct_change_with_apply'] = df.groupby('Name')['Amount'].apply(lambda x: x.pct_change())

With the result:

enter image description here

This time all the percentage changes are correct.

Why does the groupby and pct_change approach not return the correct values, but using groupby with apply does?

Edit January 28, 2018: This behavior has been corrected in the latest version of Pandas, 0.24.0. To install run pip install -U pandas.

like image 833
willk Avatar asked Jun 28 '18 14:06

willk


1 Answers

As already noted by @piRSquared in the comments; this is due to a bug filed on Github under issue #21621. It already looks to be solved in milestone 0.24.0 (due 2018-12-31). My version (0.23.4) still displayed this bugged behaviour.

like image 53
jorijnsmit Avatar answered Oct 22 '22 07:10

jorijnsmit