I have something similar to this
df = pd.DataFrame(np.random.randint(2, 10, size = (5, 2)))
df.index = pd.MultiIndex.from_tuples([(1, 'A'), (2, 'A'), (4, 'B'), 
           (5, 'B'), (8, 'B')])
df.index.names = ['foo', 'bar']
df.columns = ['count1', 'count2']
df
which gives:
       count1 count2
foo bar     
1   A    6     7
2   A    2     9
4   B    6     7
5   B    4     6
8   B    5     6
I also have a list of totals -obtained from somewhere else- by the same 'foo' index:
totals = pd.DataFrame([2., 1., 1., 1., 10.])
totals.index = [1, 2, 4, 5, 8]
totals.index.names = ['foo']
totals
which gives:
     0
foo 
1    2
2    1
4    1
5    1
8    10
How can I divide all the columns of df (count1 and count2) by the the foo number that is in totals? (hence, i need to match by the 'foo' number)
I checked this question, which looks like it should do the trick, but I couldn't figure it out.
I tried
df.div(totals, axis = 0)
and changing the level option in div, but no success.
As always, thanks a lot for your time
You can use the following basic syntax to use GroupBy on a pandas DataFrame with a multiindex: #calculate sum by level 0 and 1 of multiindex df. groupby(level=[0,1]). sum() #calculate count by level 0 and 1 of multiindex df.
div() method divides element-wise division of one pandas DataFrame by another. DataFrame elements can be divided by a pandas series or by a Python sequence as well. Calling div() on a DataFrame instance is equivalent to invoking the division operator (/).
try:
df.div(totals[0],axis='index',level='foo')
         count1  count2
foo bar                
1   A       1.0     4.5
2   A       4.0     8.0
4   B       5.0     9.0
5   B       5.0     5.0
8   B       0.9     0.5
also:
totals = pd.DataFrame([2., 1., 1., 1., 10.])
totals.index = [[1, 2, 4, 5, 8],['A', 'A', 'B', 'A', 'B']]
totals.index.names = ['foo','bar']
totals
           0
foo bar      
1   A     2.0
2   A     1.0
4   B     1.0
5   A     1.0
8   B    10.0
df[['count1','count2']].div(totals[0],axis='index')
         count1  count2
foo bar                
1   A       1.0     4.5
2   A       4.0     8.0
4   B       5.0     9.0
5   A       NaN     NaN
    B       NaN     NaN
8   B       0.9     0.5
                        Using values list from totals[0] works:
df.div(totals[0].values, axis=0)
But it doesn't take Index from totals into account. Don't know why this does not work:
df.div(totals[0], level=0, axis=0)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With