I have a dataframe with column multiindex that I need to slice and perform math operations between the slices.
# sample df
idx=pd.IndexSlice
np.random.seed(123)
tuples = list(zip(*[['one', 'one', 'two', 'two', 'three', 'three'],['foo', 'bar', 'foo', 'bar', 'foo', 'bar']]))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 6), index=['A', 'B', 'C'], columns=index)
If I wanted to perform say addition/subtraction between individual columns, I could use index slice and do it like this:
df.loc[:,idx['three','foo']] - df.loc[:,idx['two','foo']]
However, if I want to use higher level slice it doesn't work and return NaNs:
# not working
df.loc[:,idx['three',:]] - df.loc[:,idx['two',:]]
Is there an easy way to use higher level slices of the df and add/subtract corresponding columns only? My dataframe potentially contains hundreds of columns in multiindex. Thanks
pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.
Output : As we can see in the output, midx MultiIndex has 3 levels.
To make the column an index, we use the Set_index() function of pandas. If we want to make one column an index, we can simply pass the name of the column as a string in set_index(). If we want to do multi-indexing or Hierarchical Indexing, we pass the list of column names in the set_index().
If need MultiIndex in output use rename
for same level od MultiIndex:
df = df.loc[:,idx['three',:]] - df.loc[:,idx['two',:]].rename(columns={'two':'three'})
print (df)
first three
second foo bar
A -0.861579 3.157731
B -1.944822 0.772031
C 2.649912 2.621137
Advantage is possible rename both levels to new index names and join to original:
df = (df.join(df.loc[:,idx['three',:]].rename(columns={'three':'four'}) -
df.loc[:,idx['two',:]].rename(columns={'two':'four'})))
print (df)
first one two three four \
second foo bar foo bar foo bar foo
A -1.085631 0.997345 0.282978 -1.506295 -0.578600 1.651437 -0.861579
B -2.426679 -0.428913 1.265936 -0.866740 -0.678886 -0.094709 -1.944822
C 1.491390 -0.638902 -0.443982 -0.434351 2.205930 2.186786 2.649912
first
second bar
A 3.157731
B 0.772031
C 2.621137
If not necessary, use DataFrame.xs
:
df1 = df.xs('three', axis=1, level=0) - df.xs('two', axis=1, level=0)
print (df1)
second foo bar
A -0.861579 3.157731
B -1.944822 0.772031
C 2.649912 2.621137
If need first level one possible solution is MultiIndex.from_product
:
df1 = df.xs('three', axis=1, level=0) - df.xs('two', axis=1, level=0)
df1.columns = pd.MultiIndex.from_product([['new'], df1.columns],
names=['first','second'])
print (df1)
first new
second foo bar
A -0.861579 3.157731
B -1.944822 0.772031
C 2.649912 2.621137
You could try DataFrame.xs
(cross-section) :
df.xs(('three'), axis=1) - df.xs(('two'), axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With