Subtract subgroup averages from individuals without resorting to for loop

Question

I have a dataframe with a number of columns, two of which are grouping variables.

>>> df2
   Groupvar1  Groupvar2         x         y         z
0          A          1  0.726317  0.574514  0.700475
1          A          2  0.422089  0.798931  0.191157
2          A          3  0.888318  0.658061  0.686496
....
13         B          2  0.978920  0.764266  0.673941
14         B          3  0.759589  0.162488  0.698958

and I want to make a new dataframe which holds the diffrence between each datapoint in the origianl df and the mean corresponding to its subgroup.

So to begin with a make the new df with the grouped averages:

>>> grp_vars = ['Groupvar1','Groupvar2']
>>> df2_grp = df2.groupby(grp_vars)
>>> df2_grp_avg = df2_grp.mean()
>>> df2_grp_avg
                            x         y         z
Groupvar1 Groupvar2                              
A         1          0.364533  0.645237  0.886286
          2          0.325533  0.500077  0.246287
          3          0.796326  0.496950  0.510085
          4          0.774854  0.688732  0.487547
B         1          0.743783  0.452482  0.612006
          2          0.575687  0.396902  0.446126
          3          0.473152  0.476379  0.508060
          4          0.434320  0.406458  0.382187

and in the new dtaframe I want to keep the deltas, defined as:

delta = individual value - average value of the subgroup this individual is a member of

Now, it's clear to me how to do this the hard way (for loop) but I supose there must be a more elegant solution. Apprecaite any advice on finding that more elegant solution. TIA.

behzad.nouri · Accepted Answer

Use .groupby(...).transform function:

>>> demean = lambda df: df - df.mean()
>>> df.groupby(['Groupvar1', 'Groupvar2']).transform(demean)

ant then pd.concat the result with the original data-frame.

Subtract subgroup averages from individuals without resorting to for loop

Tags:

python

pandas

vectorization

Charlie_M

1 Answers

behzad.nouri

Recent Activity

Donate For Us

Subtract subgroup averages from individuals without resorting to for loop

Tags:

python

pandas

vectorization

Charlie_M

1 Answers

behzad.nouri

Related questions

Recent Activity

Donate For Us