I have a dataframe with a number of columns, two of which are grouping variables.
>>> df2
Groupvar1 Groupvar2 x y z
0 A 1 0.726317 0.574514 0.700475
1 A 2 0.422089 0.798931 0.191157
2 A 3 0.888318 0.658061 0.686496
....
13 B 2 0.978920 0.764266 0.673941
14 B 3 0.759589 0.162488 0.698958
and I want to make a new dataframe which holds the diffrence between each datapoint in the origianl df and the mean corresponding to its subgroup.
So to begin with a make the new df with the grouped averages:
>>> grp_vars = ['Groupvar1','Groupvar2']
>>> df2_grp = df2.groupby(grp_vars)
>>> df2_grp_avg = df2_grp.mean()
>>> df2_grp_avg
x y z
Groupvar1 Groupvar2
A 1 0.364533 0.645237 0.886286
2 0.325533 0.500077 0.246287
3 0.796326 0.496950 0.510085
4 0.774854 0.688732 0.487547
B 1 0.743783 0.452482 0.612006
2 0.575687 0.396902 0.446126
3 0.473152 0.476379 0.508060
4 0.434320 0.406458 0.382187
and in the new dtaframe I want to keep the deltas, defined as:
delta = individual value - average value of the subgroup this individual is a member of
Now, it's clear to me how to do this the hard way (for loop) but I supose there must be a more elegant solution. Apprecaite any advice on finding that more elegant solution. TIA.
Use .groupby(...).transform
function:
>>> demean = lambda df: df - df.mean()
>>> df.groupby(['Groupvar1', 'Groupvar2']).transform(demean)
ant then pd.concat
the result with the original data-frame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With