Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use pandas query() to correctly reference multiindex column headers in the query expression?

Tags:

python

pandas

With a simple (single-level) column index one can access a column in a pandas DataFrame using .query() as follows:

df1 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=['A','B'])
df1.query('A > 0.5')

I am struggling to achieve the analogous in a DataFrame with column multi-index:

df2 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=[['A','B'],['C','D']])
df2.query('(A,C) > 0.5') # fails
df2.query('"(A,C)" > 0.5') # fails
df2.query('("A","C") > 0.5') # fails

Is this doable? Thanks...

(As to the motivation: query() seems to allow for very concise selection on a row mutli-index - column single-index dataframe, for example:

df3 = pd.DataFrame(np.random.rand(6,2),index=[[0]*3+[1]*3,range(2,8)],columns=['A','B'])
df3.index.names=['one','two']
df3.query('one==0 & two<4 & A>0.5')

I would like to do something similar with a DF multi-indexed on both axes...)

like image 632
Slavoj Avatar asked Oct 21 '14 12:10

Slavoj


People also ask

How do I use MultiIndex columns in pandas?

pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.

What is a MultiIndex in pandas?

The MultiIndex object is the hierarchical analogue of the standard Index object which typically stores the axis labels in pandas objects. You can think of MultiIndex as an array of tuples where each tuple is unique. A MultiIndex can be created from a list of arrays (using MultiIndex.


1 Answers

There's an open issue on github for this, but in the meantime, one suggested workaround is to refer to the column via the DataFrame variable through @ notation:

df2.query("@df2.A.C > 0.5")

This is not a perfect workaround. If your header names/levels contain spaces, you will need to remove/rename them first.

like image 97
cs95 Avatar answered Oct 23 '22 06:10

cs95