Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalize values in a multiindex dataframe?

How do I normalize a multiindex dataframe?

Let's say I have the dataframe:

d = pd.DataFrame([["a",1,3],["a",2,2],["b",4,4],["b",5,8]], 
                  columns=["name","value1","value2"])

how do I calculate the normalized values for each "name"?

I know how to normalize a basic dataframe:

d = (d-d.mean(axis=0))/data.std(axis=0, ddof=1)

but I'm not able to apply this on each "name" group of my dataframe

SO the result I want is:

name, value1, value2
a     -0.5     0.5
a      0.5    -0.5
b     -0.5    -1
b      0.5     1

I tried groupby and a multiindex data frame but probably I'm not doing it in the right way

like image 847
user1883737 Avatar asked Mar 23 '23 04:03

user1883737


1 Answers

Normalizing by group is one of the examples in the groupby documentation. But it doesn't do exactly what you seem to want here.

In [2]: d.groupby('name').transform(lambda x: (x-x.mean())/x.std(ddof=1))
Out[2]: 
     value1    value2
0 -0.707107  0.707107
1  0.707107 -0.707107
2 -0.707107 -0.707107
3  0.707107  0.707107

Your desired result suggests that you actually want to normalize the values in each name group with reference to the elements in value1 and value2. For something like that, you can apply a function to each group individually, and reassemble the result.

In [3]: def normalize(group):                                                      
    mean = group.values.ravel().mean()
    std = group.values.ravel().std(ddof=1)
    return group.applymap(lambda x: (x - mean)/std)
   ....: 

In [4]: pd.concat([normalize(group) for _, group in d.set_index('name').groupby(level=0)])
Out[4]: 
        value1    value2
name                    
a    -1.224745  1.224745
a     0.000000  0.000000
b    -0.660338 -0.660338
b    -0.132068  1.452744
like image 69
Dan Allan Avatar answered Mar 29 '23 02:03

Dan Allan