Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas divide two multi index series

I have a multi-index series that looks like

            value
foo bar baz     
1   A    C    6
         D    2
    B    D    6
         F    4
2   B    C    5
         F    7

I would like to sum on foo and bar, to get the sum of values for each foo, bar, regardless of baz, which I can achieve with df.groupby(level=[0, 1]).sum(). This series looks like:

        sum_value
foo bar      
1   A      8
    B      10
2   B      12

However, I would then like to divide the original value by the new sum_value, to get the percentage of baz, given foo and bar.

            value
foo bar baz     
1   A    C    6/8=.75
         D    2/8=.25
    B    D    6/10=.6
         F    4/10=.5
2   B    C    5/12=.42
         F    7/12=.58

I have tried df.div(df.groupby(level=[0, 1]).sum()), but get a Not Implemented error. Thanks!

like image 483
Jake Morris Avatar asked Dec 18 '17 21:12

Jake Morris


People also ask

How do you divide two values in pandas?

The simple division (/) operator is the first way to divide two columns. You will split the First Column with the other columns here. This is the simplest method of dividing two columns in Pandas. We will import Pandas and take at least two columns while declaring the variables.

Can a DataFrame have multiple indexes?

A multi-index (also known as hierarchical index) dataframe uses more than one column as the index of the dataframe.

How do you multiply two series in pandas?

Multiplying of two pandas. Series objects can be done through applying the multiplication operator “*” as well. Through mul() method, handling None values in the data is possible by replacing them with a default value using the parameter fill_value.


2 Answers

You could do it like this using transform to get sum with like indexes of oringal dataframe then use div with Pandas intrinsic data alignment:

df.div(df.groupby(['foo','bar']).transform('sum'))

Output:

                value
foo bar baz          
1   A   C    0.750000
        D    0.250000
    B   D    0.600000
        F    0.400000
2   B   C    0.416667
        F    0.583333
like image 152
Scott Boston Avatar answered Sep 25 '22 17:09

Scott Boston


In [40]: df['value'] = df.reset_index('baz', drop=True).div(df.sum(level=[0,1])).values

In [41]: df
Out[41]:
                value
foo bar baz
1.0 A   C    0.750000
        D    0.250000
    B   D    0.600000
        F    0.400000
2.0 B   C    0.416667
        F    0.583333
like image 45
MaxU - stop WAR against UA Avatar answered Sep 25 '22 17:09

MaxU - stop WAR against UA