How do I normalize a multiindex dataframe?
Let's say I have the dataframe:
d = pd.DataFrame([["a",1,3],["a",2,2],["b",4,4],["b",5,8]],
columns=["name","value1","value2"])
how do I calculate the normalized values for each "name"?
I know how to normalize a basic dataframe:
d = (d-d.mean(axis=0))/data.std(axis=0, ddof=1)
but I'm not able to apply this on each "name" group of my dataframe
SO the result I want is:
name, value1, value2
a -0.5 0.5
a 0.5 -0.5
b -0.5 -1
b 0.5 1
I tried groupby and a multiindex data frame but probably I'm not doing it in the right way
Normalizing by group is one of the examples in the groupby documentation. But it doesn't do exactly what you seem to want here.
In [2]: d.groupby('name').transform(lambda x: (x-x.mean())/x.std(ddof=1))
Out[2]:
value1 value2
0 -0.707107 0.707107
1 0.707107 -0.707107
2 -0.707107 -0.707107
3 0.707107 0.707107
Your desired result suggests that you actually want to normalize the values in each name group with reference to the elements in value1
and value2
. For something like that, you can apply a function to each group individually, and reassemble the result.
In [3]: def normalize(group):
mean = group.values.ravel().mean()
std = group.values.ravel().std(ddof=1)
return group.applymap(lambda x: (x - mean)/std)
....:
In [4]: pd.concat([normalize(group) for _, group in d.set_index('name').groupby(level=0)])
Out[4]:
value1 value2
name
a -1.224745 1.224745
a 0.000000 0.000000
b -0.660338 -0.660338
b -0.132068 1.452744
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With