math operations between column in multiindex dataframe

Tags:

I have a dataframe with column multiindex that I need to slice and perform math operations between the slices.

# sample df
idx=pd.IndexSlice
np.random.seed(123)
tuples = list(zip(*[['one', 'one', 'two', 'two', 'three', 'three'],['foo', 'bar', 'foo', 'bar', 'foo', 'bar']]))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 6), index=['A', 'B', 'C'], columns=index)

If I wanted to perform say addition/subtraction between individual columns, I could use index slice and do it like this:

df.loc[:,idx['three','foo']] - df.loc[:,idx['two','foo']]

However, if I want to use higher level slice it doesn't work and return NaNs:

# not working
df.loc[:,idx['three',:]] - df.loc[:,idx['two',:]]

Is there an easy way to use higher level slices of the df and add/subtract corresponding columns only? My dataframe potentially contains hundreds of columns in multiindex. Thanks

617

asked Mar 06 '19 10:03

whada

Video Answer

2 Answers

If need MultiIndex in output use rename for same level od MultiIndex:

df = df.loc[:,idx['three',:]] - df.loc[:,idx['two',:]].rename(columns={'two':'three'})
print (df)
first      three          
second       foo       bar
A      -0.861579  3.157731
B      -1.944822  0.772031
C       2.649912  2.621137

Advantage is possible rename both levels to new index names and join to original:

df = (df.join(df.loc[:,idx['three',:]].rename(columns={'three':'four'}) - 
              df.loc[:,idx['two',:]].rename(columns={'two':'four'})))
print (df)
first        one                 two               three                four  \
second       foo       bar       foo       bar       foo       bar       foo   
A      -1.085631  0.997345  0.282978 -1.506295 -0.578600  1.651437 -0.861579   
B      -2.426679 -0.428913  1.265936 -0.866740 -0.678886 -0.094709 -1.944822   
C       1.491390 -0.638902 -0.443982 -0.434351  2.205930  2.186786  2.649912   

first             
second       bar  
A       3.157731  
B       0.772031  
C       2.621137

If not necessary, use DataFrame.xs:

df1 = df.xs('three', axis=1, level=0) - df.xs('two', axis=1, level=0)
print (df1)
second       foo       bar
A      -0.861579  3.157731
B      -1.944822  0.772031
C       2.649912  2.621137

If need first level one possible solution is MultiIndex.from_product:

df1 = df.xs('three', axis=1, level=0) - df.xs('two', axis=1, level=0)
df1.columns = pd.MultiIndex.from_product([['new'], df1.columns], 
                                         names=['first','second'])
print (df1)
first        new          
second       foo       bar
A      -0.861579  3.157731
B      -1.944822  0.772031
C       2.649912  2.621137

148

answered Nov 15 '22 12:11

jezrael

You could try DataFrame.xs (cross-section) :

df.xs(('three'), axis=1) - df.xs(('two'), axis=1)

answered Nov 15 '22 10:11

Chris Adams

Related questions
                            
                                How to correctly upgrade pip using ansible?
                            
                                Appending Pandas DataFrame to existing Excel document
                            
                                Golang equivalent of creating a subprocess in Python
                            
                                Dask dataframe - split column into multiple rows based on delimiter
                            
                                How do I execute an SQLite script from within python?
                            
                                Pandas - convert float to proper datetime or time object
                            
                                Training, Validation, Testing Batch Size Ratio
                            
                                Exception Handling in Apache Beam pipelines using Python
                            
                                Calculate the euclidian distance between an array of points to a line segment in Python without for loop
                            
                                Activating conda environment in bash script that runs on startup
                            
                                how to implement Django-Private-Chat in Django 2
                            
                                Why does pip skip bracket package in requirements.txt?
                            
                                Invalid format string at _generate_jwt_token
                            
                                python3 -m venv: how to specify Python point release/version?
                            
                                Splitting a string on non digits
                            
                                How to return two values in cython cdef without gil (nogil)
                            
                                Is there a way to merge a list of dicts into a single dict using glom library? [duplicate]
                            
                                What's a good way to detect when a Python program exits or crashes?
                            
                                Test if an integer is an index value of a slice
                            
                                Viewing Graph from saved .pbtxt file on Tensorboard

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

math operations between column in multiindex dataframe

Tags:

python

pandas

dataframe