Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging a pandas groupby result back into DataFrame

Tags:

python

pandas

I have a DataFrame that looks like this...

   idn value  
0  ID1    25
1  ID1    30
2  ID2    30
3  ID2    50

I want to add another column to this frame that is the max 'value' grouped by 'idn'

I want a result that looks like this.

   idn value  max_val
0  ID1    25       30
1  ID1    30       30
2  ID2    30       50
3  ID2    50       50

I can extract the max of 'value' using a group by like so...

df[['idn', 'value']].groupby('idn')['value'].max()

However, I am unable to merge that result back into the original DataFrame.

What is the best way to get the desired result?

Thank You

like image 996
Rob Kulseth Avatar asked Apr 15 '15 04:04

Rob Kulseth


People also ask

How do I merge groupby in Pandas?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.

What does PD groupby return?

Returns a groupby object that contains information about the groups. Convenience method for frequency conversion and resampling of time series. See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.

What does groupby sum return?

groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

What is the difference between merge join and concatenate?

merge() for combining data on common columns or indices. . join() for combining data on a key column or an index. concat() for combining DataFrames across rows or columns.


2 Answers

Use the transform method on a groupby object:

In [5]: df['maxval'] = df.groupby(by=['idn']).transform('max')

In [6]: df
Out[6]: 
   idn  value  maxval
0  ID1     25      30
1  ID1     30      30
2  ID2     30      50
3  ID2     50      50
like image 100
Paul H Avatar answered Sep 21 '22 11:09

Paul H


set the index of df to idn, and then use df.merge. after the merge, reset the index and rename columns

dfmax = df.groupby('idn')['value'].max()

df.set_index('idn', inplace=True)

df = df.merge(dfmax, how='outer', left_index=True, right_index=True)

df.reset_index(inplace=True)

df.columns = ['idn', 'value', 'max_value']
like image 36
Haleemur Ali Avatar answered Sep 18 '22 11:09

Haleemur Ali