With a simple (single-level) column index one can access a column in a pandas DataFrame using .query() as follows:
df1 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=['A','B'])
df1.query('A > 0.5')
I am struggling to achieve the analogous in a DataFrame with column multi-index:
df2 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=[['A','B'],['C','D']])
df2.query('(A,C) > 0.5') # fails
df2.query('"(A,C)" > 0.5') # fails
df2.query('("A","C") > 0.5') # fails
Is this doable? Thanks...
(As to the motivation: query() seems to allow for very concise selection on a row mutli-index - column single-index dataframe, for example:
df3 = pd.DataFrame(np.random.rand(6,2),index=[[0]*3+[1]*3,range(2,8)],columns=['A','B'])
df3.index.names=['one','two']
df3.query('one==0 & two<4 & A>0.5')
I would like to do something similar with a DF multi-indexed on both axes...)
pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.
The MultiIndex object is the hierarchical analogue of the standard Index object which typically stores the axis labels in pandas objects. You can think of MultiIndex as an array of tuples where each tuple is unique. A MultiIndex can be created from a list of arrays (using MultiIndex.
There's an open issue on github for this, but in the meantime, one suggested workaround is to refer to the column via the DataFrame variable through @
notation:
df2.query("@df2.A.C > 0.5")
This is not a perfect workaround. If your header names/levels contain spaces, you will need to remove/rename them first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With