Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas division (.div) with multiindex

Tags:

python

pandas

I have something similar to this

df = pd.DataFrame(np.random.randint(2, 10, size = (5, 2)))
df.index = pd.MultiIndex.from_tuples([(1, 'A'), (2, 'A'), (4, 'B'), 
           (5, 'B'), (8, 'B')])
df.index.names = ['foo', 'bar']
df.columns = ['count1', 'count2']
df

which gives:

       count1 count2
foo bar     
1   A    6     7
2   A    2     9
4   B    6     7
5   B    4     6
8   B    5     6

I also have a list of totals -obtained from somewhere else- by the same 'foo' index:

totals = pd.DataFrame([2., 1., 1., 1., 10.])
totals.index = [1, 2, 4, 5, 8]
totals.index.names = ['foo']
totals

which gives:

     0
foo 
1    2
2    1
4    1
5    1
8    10

How can I divide all the columns of df (count1 and count2) by the the foo number that is in totals? (hence, i need to match by the 'foo' number)

I checked this question, which looks like it should do the trick, but I couldn't figure it out.

I tried

df.div(totals, axis = 0)

and changing the level option in div, but no success.

As always, thanks a lot for your time

like image 744
cd98 Avatar asked Oct 22 '13 16:10

cd98


People also ask

How do you get GroupBy MultiIndex pandas?

You can use the following basic syntax to use GroupBy on a pandas DataFrame with a multiindex: #calculate sum by level 0 and 1 of multiindex df. groupby(level=[0,1]). sum() #calculate count by level 0 and 1 of multiindex df.

How do I divide one data frame by another?

div() method divides element-wise division of one pandas DataFrame by another. DataFrame elements can be divided by a pandas series or by a Python sequence as well. Calling div() on a DataFrame instance is equivalent to invoking the division operator (/).


2 Answers

try:

df.div(totals[0],axis='index',level='foo')

         count1  count2
foo bar                
1   A       1.0     4.5
2   A       4.0     8.0
4   B       5.0     9.0
5   B       5.0     5.0
8   B       0.9     0.5

also:

totals = pd.DataFrame([2., 1., 1., 1., 10.])
totals.index = [[1, 2, 4, 5, 8],['A', 'A', 'B', 'A', 'B']]
totals.index.names = ['foo','bar']
totals
           0
foo bar      
1   A     2.0
2   A     1.0
4   B     1.0
5   A     1.0
8   B    10.0

df[['count1','count2']].div(totals[0],axis='index')
         count1  count2
foo bar                
1   A       1.0     4.5
2   A       4.0     8.0
4   B       5.0     9.0
5   A       NaN     NaN
    B       NaN     NaN
8   B       0.9     0.5
like image 134
user8641707 Avatar answered Sep 20 '22 02:09

user8641707


Using values list from totals[0] works:

df.div(totals[0].values, axis=0)

But it doesn't take Index from totals into account. Don't know why this does not work:

df.div(totals[0], level=0, axis=0)
like image 45
Roman Pekar Avatar answered Sep 20 '22 02:09

Roman Pekar