Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby multiple columns, with pct_change

I'm trying to find the period-over-period growth in Value for each unique group, grouped by (Company, Group, and Date).

Company Group Date     Value
A       X     2015-01  1
A       X     2015-02  2
A       X     2015-03  1.5
A       XX    2015-01  1
A       XX    2015-02  1.5
A       XX    2015-03  0.75
A       XX    2015-04  1
B       Y     2015-01  1
B       Y     2015-02  1.5
B       Y     2015-03  2
B       Y     2015-04  3
B       YY    2015-01  2
B       YY    2015-02  2.5
B       YY    2015-03  3

I've tried:

df.groupby(['Date','Company','Group']).pct_change()

but this returns all NaN.

The result I'm looking for is:

Company Group Date     Value/People
A       X     2015-01  NaN
A       X     2015-02  1.0
A       X     2015-03  -0.25
A       XX    2015-01  NaN
A       XX    2015-02  0.5
A       XX    2015-03  -0.5
A       XX    2015-04  0.33
B       Y     2015-01  NaN
B       Y     2015-02  0.5
B       Y     2015-03  0.33
B       Y     2015-04  0.5
B       YY    2015-01  NaN
B       YY    2015-02  0.25
B       YY    2015-03  0.2
like image 423
user3357979 Avatar asked Oct 26 '16 22:10

user3357979


People also ask

Can you use Groupby with multiple columns in pandas?

groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.

What does Pct_change do in pandas?

The pct_change() method returns a DataFrame with the percentage difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.

What does .first do in Groupby?

Pandas dataframe has groupby([column(s)]). first() method which is used to get the first record from each group. The result of grouby.

What does DF Groupby (' year ') do?

groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.


2 Answers

you want to get your date into the row index and groups/company into the columns

d1 = df.set_index(['Date', 'Company', 'Group']).Value.unstack(['Company', 'Group'])
d1

enter image description here

then use pct_change

d1.pct_change()

enter image description here

OR

with groupby

df['pct'] = df.sort_values('Date').groupby(['Company', 'Group']).Value.pct_change()
df

enter image description here

like image 71
piRSquared Avatar answered Sep 25 '22 16:09

piRSquared


I'm not sure the groupby method works as intended as of Pandas 0.23.4 at least.

df['pct'] = df.sort_values('Date').groupby(['Company', 'Group']).Value.pct_change()

Produces this, which is incorrect for purposes of the question:

Incorrect Outcome

The Index+Stack method still works as intended, but you need to do additional merges to get it into the original form requested.

d1 = df.set_index(['Date', 'Company', 'Group']).Value.unstack(['Company', 'Group'])
d1 = d1.pct_change().stack([0,1]).reset_index()
df = df.merge(d1, on=['Company', 'Group', 'Date'], how='left')
df.rename(columns={0: 'pct'}, inplace=True)
df

Correct Outcome

like image 20
SimonR Avatar answered Sep 24 '22 16:09

SimonR