With a simple (single-level) column index one can access a column in a pandas DataFrame using .query() as follows: <pre class="prettyprint"><code>df1 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=['A','B']) df1.query('A > 0.5') </code></pre> I am struggling to achieve the analogous in a DataFrame with column multi-index: <pre class="prettyprint"><code>df2 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=[['A','B'],['C','D']]) df2.query('(A,C) > 0.5') # fails df2.query('"(A,C)" > 0.5') # fails df2.query('("A","C") > 0.5') # fails </code></pre> Is this doable? Thanks... (As to the motivation: query() seems to allow for very concise selection on a row mutli-index - column single-index dataframe, for example: <pre class="prettyprint"><code>df3 = pd.DataFrame(np.random.rand(6,2),index=[[0]*3+[1]*3,range(2,8)],columns=['A','B']) df3.index.names=['one','two'] df3.query('one==0 & two<4 & A>0.5') </code></pre> I would like to do something similar with a DF multi-indexed on both axes...)

There's an open issue on github for this, but in the meantime, one suggested workaround is to refer to the column via the DataFrame variable through <code>@</code> notation: <pre class="prettyprint"><code>df2.query("@df2.A.C > 0.5") </code></pre> This is not a perfect workaround. If your header names/levels contain spaces, you will need to remove/rename them first.

How to use pandas query() to correctly reference multiindex column headers in the query expression?

Tags:

python

pandas

With a simple (single-level) column index one can access a column in a pandas DataFrame using .query() as follows:

df1 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=['A','B'])
df1.query('A > 0.5')

I am struggling to achieve the analogous in a DataFrame with column multi-index:

df2 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=[['A','B'],['C','D']])
df2.query('(A,C) > 0.5') # fails
df2.query('"(A,C)" > 0.5') # fails
df2.query('("A","C") > 0.5') # fails

Is this doable? Thanks...

(As to the motivation: query() seems to allow for very concise selection on a row mutli-index - column single-index dataframe, for example:

df3 = pd.DataFrame(np.random.rand(6,2),index=[[0]*3+[1]*3,range(2,8)],columns=['A','B'])
df3.index.names=['one','two']
df3.query('one==0 & two<4 & A>0.5')

I would like to do something similar with a DF multi-indexed on both axes...)

632

asked Oct 21 '14 12:10

Slavoj

1 Answers

There's an open issue on github for this, but in the meantime, one suggested workaround is to refer to the column via the DataFrame variable through @ notation:

df2.query("@df2.A.C > 0.5")

This is not a perfect workaround. If your header names/levels contain spaces, you will need to remove/rename them first.

answered Oct 23 '22 06:10

cs95

Related questions
                            
                                File handling speed of python 3.3 compared to fortran 77
                            
                                ImportError: numpy.core.multiarray failed to import while using mod_wsgi
                            
                                Need a way to retrieve the current playing song from Zune and Windows Media Player with Python
                            
                                Running Django test with setup.py test and tox
                            
                                How to use python Mock side_effect to act as a Class method in unit test
                            
                                F2PY - Access module parameter from subroutine
                            
                                How to find the highest number less than target value in a list?
                            
                                Using boolean indexing for row and column MultiIndex in Pandas
                            
                                Python-iptables how to optimize code
                            
                                Using multiple features with scikit-learn
                            
                                Scipy expit: Unexpected behavour. NaNs
                            
                                Create pdf with tooltips in python
                            
                                Python generator send: don't yield a new value after a send
                            
                                How do I make Python folding in vim not visually ruin the whitespace?
                            
                                Numpy broadcasting sliced arrays and vectors
                            
                                Will passing open() as json.load() parameter leave the file handle open?
                            
                                Multithreading on numpy/pandas matrix multiplication?
                            
                                how to get message-id of email sent from smtplib
                            
                                Consume multiple messages at a time
                            
                                strange results when benchmarking numpy with atlas and openblas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With