I have a pandas dataframe with two multiindex levels
df
Out[202]:
A B C D
first second
1 1 -0.080810 0.865259 -0.371148 0.346480
2 -0.026636 1.259460 -1.109295 -0.871204
3 0.372008 -1.778272 0.727838 0.620727
4 0.918075 0.564741 2.027432 -1.614162
5 -0.373527 -0.186027 0.225399 0.722733
2 1 0.344241 0.170596 -0.050763 2.692102
2 -1.665413 0.357033 -0.691327 -0.983103
3 1.277470 -1.841702 0.582107 -0.454315
4 2.374108 -0.557879 0.797296 0.803622
5 -1.001092 0.131217 0.131378 -0.636299
I also have a second dataframe consisting of two rows out of the original dataframe, corresponding to the multiindex level 'second == 1'
to_subtract = df.query('second == 1')
to_subtract
Out[200]:
A B C D
first second
1 1 -0.080810 0.865259 -0.371148 0.346480
2 1 0.344241 0.170596 -0.050763 2.692102
I want to subtract 'df' by 'to_subtract' on the first level, but for all values that correspond to the first level of the multiindex. Pandas knows how to subtract based on matching ALL levels of the multiindex, as shown below
df.sub(to_subtract)
Out[201]:
A B C D
first second
1 1 0.0 0.0 0.0 0.0
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
5 NaN NaN NaN NaN
2 1 0.0 0.0 0.0 0.0
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
5 NaN NaN NaN NaN
Question is, how do I subtract all values from df[first===1] by to_subtract[first==1], and all values from df[first==2] by to_subtract[first==2]. I assume I can do a for loop and loop through everything, but I would rather avoid that if I can, since these dataframes may get large in the future.
Thanks in advance
subtract() function basically perform subtraction of series and other, element-wise (binary operator sub). It is equivalent to series - other , but with support to substitute a fill_value for missing data in one of the inputs.
Drop Level Using MultiIndex.droplevel() to drop columns level. When you have Multi-level columns DataFrame. columns return MultiIndex object and use droplevel() on this object to drop level.
subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.
I think need remove second level of MultiIndex
and then add parameter level=0
for align by first levels in DataFrame.sub
:
to_subtract = df.query('second == 1').reset_index(level=1, drop=True)
#same as
#to_subtract = df.xs(1, level=1)
print (to_subtract)
A B C D
first
1 -0.080810 0.865259 -0.371148 0.346480
2 0.344241 0.170596 -0.050763 2.692102
df1 = df.sub(to_subtract, level=0)
print (df1)
A B C D
first second
1 1 0.000000 0.000000 0.000000 0.000000
2 0.054174 0.394201 -0.738147 -1.217684
3 0.452818 -2.643531 1.098986 0.274247
4 0.998885 -0.300518 2.398580 -1.960642
5 -0.292717 -1.051286 0.596547 0.376253
2 1 0.000000 0.000000 0.000000 0.000000
2 -2.009654 0.186437 -0.640564 -3.675205
3 0.933229 -2.012298 0.632870 -3.146417
4 2.029867 -0.728475 0.848059 -1.888480
5 -1.345333 -0.039379 0.182141 -3.328401
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With