Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas new column from groupby averages

I have a DataFrame

>>> df = pd.DataFrame({'a':[1,1,1,2,2,2],
...                    'b':[10,20,20,10,20,20],
...                    'result':[100,200,300,400,500,600]})
... 
>>> df
   a   b  result
0  1  10     100
1  1  20     200
2  1  20     300
3  2  10     400
4  2  20     500
5  2  20     600

and want to create a new column that is the average result for the corresponding values for 'a' and 'b'. I can get those values with a groupby:

>>> df.groupby(['a','b'])['result'].mean()
a  b 
1  10    100
   20    250
2  10    400
   20    550
Name: result, dtype: int64

but can not figure out how to turn that into a new column in the original DataFrame. The final result should look like this,

>>> df
   a   b  result  avg_result
0  1  10     100         100
1  1  20     200         250
2  1  20     300         250
3  2  10     400         400
4  2  20     500         550
5  2  20     600         550

I could do this by looping through the combinations of 'a' and 'b' but that would get really slow and unwieldy for larger sets of data. There is probably a much simpler and faster way to go.

like image 600
rurp Avatar asked Oct 30 '15 21:10

rurp


1 Answers

You need transform:

df['avg_result'] = df.groupby(['a', 'b'])['result'].transform('mean')

This generates a correctly indexed column of the groupby values for you:

   a   b  result  avg_result
0  1  10     100         100
1  1  20     200         250
2  1  20     300         250
3  2  10     400         400
4  2  20     500         550
5  2  20     600         550
like image 174
Alex Riley Avatar answered Sep 21 '22 20:09

Alex Riley