I have a multi-index series that looks like <pre class="prettyprint"><code> value foo bar baz 1 A C 6 D 2 B D 6 F 4 2 B C 5 F 7 </code></pre> I would like to sum on foo and bar, to get the sum of values for each foo, bar, regardless of baz, which I can achieve with <code>df.groupby(level=[0, 1]).sum()</code>. This series looks like: <pre class="prettyprint"><code> sum_value foo bar 1 A 8 B 10 2 B 12 </code></pre> However, I would then like to divide the original <code>value</code> by the new <code>sum_value</code>, to get the percentage of baz, given foo and bar. <pre class="prettyprint"><code> value foo bar baz 1 A C 6/8=.75 D 2/8=.25 B D 6/10=.6 F 4/10=.5 2 B C 5/12=.42 F 7/12=.58 </code></pre> I have tried <code>df.div(df.groupby(level=[0, 1]).sum())</code>, but get a Not Implemented error. Thanks!

You could do it like this using <code>transform</code> to get sum with like indexes of oringal dataframe then use <code>div</code> with Pandas intrinsic data alignment: <pre class="prettyprint"><code>df.div(df.groupby(['foo','bar']).transform('sum')) </code></pre> Output: <pre class="prettyprint"><code> value foo bar baz 1 A C 0.750000 D 0.250000 B D 0.600000 F 0.400000 2 B C 0.416667 F 0.583333 </code></pre>

pandas divide two multi index series

Tags:

python

pandas

group-by

I have a multi-index series that looks like

            value
foo bar baz     
1   A    C    6
         D    2
    B    D    6
         F    4
2   B    C    5
         F    7

I would like to sum on foo and bar, to get the sum of values for each foo, bar, regardless of baz, which I can achieve with df.groupby(level=[0, 1]).sum(). This series looks like:

        sum_value
foo bar      
1   A      8
    B      10
2   B      12

However, I would then like to divide the original value by the new sum_value, to get the percentage of baz, given foo and bar.

            value
foo bar baz     
1   A    C    6/8=.75
         D    2/8=.25
    B    D    6/10=.6
         F    4/10=.5
2   B    C    5/12=.42
         F    7/12=.58

I have tried df.div(df.groupby(level=[0, 1]).sum()), but get a Not Implemented error. Thanks!

483

asked Dec 18 '17 21:12

Jake Morris

2 Answers

You could do it like this using transform to get sum with like indexes of oringal dataframe then use div with Pandas intrinsic data alignment:

df.div(df.groupby(['foo','bar']).transform('sum'))

Output:

                value
foo bar baz          
1   A   C    0.750000
        D    0.250000
    B   D    0.600000
        F    0.400000
2   B   C    0.416667
        F    0.583333

152

answered Sep 25 '22 17:09

Scott Boston

In [40]: df['value'] = df.reset_index('baz', drop=True).div(df.sum(level=[0,1])).values

In [41]: df
Out[41]:
                value
foo bar baz
1.0 A   C    0.750000
        D    0.250000
    B   D    0.600000
        F    0.400000
2.0 B   C    0.416667
        F    0.583333

answered Sep 25 '22 17:09

MaxU - stop WAR against UA

Related questions
                            
                                NumPy ndarray.all() vs np.all(ndarray) vs all(ndarray)
                            
                                Python - Getting and setting clipboard data with subprocesses
                            
                                Using cross validation and AUC-ROC for a logistic regression model in sklearn
                            
                                Python imaplib selecting folders
                            
                                How to find column-index of top-n values within each row of huge dataframe
                            
                                How to test file upload in Django rest framework using PUT?
                            
                                Unpacking result of delayed function
                            
                                Bidirectional LSTM with Batch Normalization in Keras
                            
                                Simple way to convert Python docstrings from reStructured Text to Google style?
                            
                                How to select area within a plot (Python) and extract the data within the area
                            
                                Double underscore in python
                            
                                running multiple cells in jupyter notebook simultaneously
                            
                                Pandas - KeyError: '[] not in index' when training a Keras model
                            
                                Is django-channels suitable for real time game?
                            
                                How to set up logging for aiohttp.client when making request with aiohttp.ClientSession()?
                            
                                Implementing a Trie to support autocomplete in Python
                            
                                Attempt to access dataframe column displays "<bound method NDFrame.xxx..."
                            
                                Neural Network (No hidden layers) vs Logistic Regression?
                            
                                Pandas group by : Include all rows even the ones with empty column values
                            
                                Importing external module in single-file exe created with PyInstaller

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With