I'm trying to find the period-over-period growth in Value
for each unique group, grouped by (Company
, Group
, and Date
).
Company Group Date Value
A X 2015-01 1
A X 2015-02 2
A X 2015-03 1.5
A XX 2015-01 1
A XX 2015-02 1.5
A XX 2015-03 0.75
A XX 2015-04 1
B Y 2015-01 1
B Y 2015-02 1.5
B Y 2015-03 2
B Y 2015-04 3
B YY 2015-01 2
B YY 2015-02 2.5
B YY 2015-03 3
I've tried:
df.groupby(['Date','Company','Group']).pct_change()
but this returns all NaN.
The result I'm looking for is:
Company Group Date Value/People
A X 2015-01 NaN
A X 2015-02 1.0
A X 2015-03 -0.25
A XX 2015-01 NaN
A XX 2015-02 0.5
A XX 2015-03 -0.5
A XX 2015-04 0.33
B Y 2015-01 NaN
B Y 2015-02 0.5
B Y 2015-03 0.33
B Y 2015-04 0.5
B YY 2015-01 NaN
B YY 2015-02 0.25
B YY 2015-03 0.2
groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
The pct_change() method returns a DataFrame with the percentage difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
Pandas dataframe has groupby([column(s)]). first() method which is used to get the first record from each group. The result of grouby.
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.
you want to get your date into the row index and groups/company into the columns
d1 = df.set_index(['Date', 'Company', 'Group']).Value.unstack(['Company', 'Group'])
d1
then use pct_change
d1.pct_change()
OR
with groupby
df['pct'] = df.sort_values('Date').groupby(['Company', 'Group']).Value.pct_change()
df
I'm not sure the groupby
method works as intended as of Pandas 0.23.4 at least.
df['pct'] = df.sort_values('Date').groupby(['Company', 'Group']).Value.pct_change()
Produces this, which is incorrect for purposes of the question:
The Index+Stack method still works as intended, but you need to do additional merges to get it into the original form requested.
d1 = df.set_index(['Date', 'Company', 'Group']).Value.unstack(['Company', 'Group'])
d1 = d1.pct_change().stack([0,1]).reset_index()
df = df.merge(d1, on=['Company', 'Group', 'Date'], how='left')
df.rename(columns={0: 'pct'}, inplace=True)
df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With