I'm trying to find the period-over-period growth in <code>Value</code> for each unique group, grouped by (<code>Company</code>, <code>Group</code>, and <code>Date</code>). <pre class="prettyprint"><code>Company Group Date Value A X 2015-01 1 A X 2015-02 2 A X 2015-03 1.5 A XX 2015-01 1 A XX 2015-02 1.5 A XX 2015-03 0.75 A XX 2015-04 1 B Y 2015-01 1 B Y 2015-02 1.5 B Y 2015-03 2 B Y 2015-04 3 B YY 2015-01 2 B YY 2015-02 2.5 B YY 2015-03 3 </code></pre> I've tried: <pre class="prettyprint"><code>df.groupby(['Date','Company','Group']).pct_change() </code></pre> but this returns all NaN. The result I'm looking for is: <pre class="prettyprint"><code>Company Group Date Value/People A X 2015-01 NaN A X 2015-02 1.0 A X 2015-03 -0.25 A XX 2015-01 NaN A XX 2015-02 0.5 A XX 2015-03 -0.5 A XX 2015-04 0.33 B Y 2015-01 NaN B Y 2015-02 0.5 B Y 2015-03 0.33 B Y 2015-04 0.5 B YY 2015-01 NaN B YY 2015-02 0.25 B YY 2015-03 0.2 </code></pre>

you want to get your date into the row index and groups/company into the columns <pre class="prettyprint"><code>d1 = df.set_index(['Date', 'Company', 'Group']).Value.unstack(['Company', 'Group']) d1 </code></pre> <img src="https://i.stack.imgur.com/hdNBI.png" alt="enter image description here"> then use <code>pct_change</code> <pre class="prettyprint"><code>d1.pct_change() </code></pre> <img src="https://i.stack.imgur.com/aOzA1.png" alt="enter image description here"> OR with groupby <pre class="prettyprint"><code>df['pct'] = df.sort_values('Date').groupby(['Company', 'Group']).Value.pct_change() df </code></pre> <img src="https://i.stack.imgur.com/2GAfK.png" alt="enter image description here">

Pandas groupby multiple columns, with pct_change

Tags:

python

pandas

pandas-groupby

I'm trying to find the period-over-period growth in Value for each unique group, grouped by (Company, Group, and Date).

Company Group Date     Value
A       X     2015-01  1
A       X     2015-02  2
A       X     2015-03  1.5
A       XX    2015-01  1
A       XX    2015-02  1.5
A       XX    2015-03  0.75
A       XX    2015-04  1
B       Y     2015-01  1
B       Y     2015-02  1.5
B       Y     2015-03  2
B       Y     2015-04  3
B       YY    2015-01  2
B       YY    2015-02  2.5
B       YY    2015-03  3

I've tried:

df.groupby(['Date','Company','Group']).pct_change()

but this returns all NaN.

The result I'm looking for is:

Company Group Date     Value/People
A       X     2015-01  NaN
A       X     2015-02  1.0
A       X     2015-03  -0.25
A       XX    2015-01  NaN
A       XX    2015-02  0.5
A       XX    2015-03  -0.5
A       XX    2015-04  0.33
B       Y     2015-01  NaN
B       Y     2015-02  0.5
B       Y     2015-03  0.33
B       Y     2015-04  0.5
B       YY    2015-01  NaN
B       YY    2015-02  0.25
B       YY    2015-03  0.2

423

asked Oct 26 '16 22:10

user3357979

2 Answers

you want to get your date into the row index and groups/company into the columns

d1 = df.set_index(['Date', 'Company', 'Group']).Value.unstack(['Company', 'Group'])
d1

enter image description here

then use pct_change

d1.pct_change()

enter image description here

with groupby

df['pct'] = df.sort_values('Date').groupby(['Company', 'Group']).Value.pct_change()
df

enter image description here

answered Sep 25 '22 16:09

piRSquared

I'm not sure the groupby method works as intended as of Pandas 0.23.4 at least.

df['pct'] = df.sort_values('Date').groupby(['Company', 'Group']).Value.pct_change()

Produces this, which is incorrect for purposes of the question:

Incorrect Outcome

The Index+Stack method still works as intended, but you need to do additional merges to get it into the original form requested.

d1 = df.set_index(['Date', 'Company', 'Group']).Value.unstack(['Company', 'Group'])
d1 = d1.pct_change().stack([0,1]).reset_index()
df = df.merge(d1, on=['Company', 'Group', 'Date'], how='left')
df.rename(columns={0: 'pct'}, inplace=True)
df

Correct Outcome

answered Sep 24 '22 16:09

SimonR

Related questions
                            
                                dynamically loading django apps at runtime
                            
                                Python/OpenCV: Computing a depth map from stereo images
                            
                                Mock a MySQL database in Python
                            
                                Django REST Framework nested resource key "id" unaccessible
                            
                                Logarithmic interpolation in python
                            
                                finding needle in haystack, what is a better solution?
                            
                                Django - How to simply get domain name? [duplicate]
                            
                                Make Scrapy follow links and collect data
                            
                                Normalise 2D Numpy Array: Zero Mean Unit Variance
                            
                                Spark: Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion
                            
                                Timeout a file download with Python urllib?
                            
                                'Could not interpret input' error with Seaborn when plotting groupbys
                            
                                Fast random weighted selection across all rows of a stochastic matrix
                            
                                Using math.isclose function with values close to 0
                            
                                How can I find kafka config file?
                            
                                How to use Django variable in JavaScript file?
                            
                                Matplotlib can't find font installed in my Linux machine
                            
                                Can two Python argparse objects be combined?
                            
                                How to serve media files on Django production environment?
                            
                                Remove titlebar without overrideredirect() using Tkinter?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas groupby multiple columns, with pct_change

Tags:

python

pandas

pandas-groupby

user3357979

People also ask

2 Answers

piRSquared

SimonR

Recent Activity

Donate For Us