I have discovered the pandas DataFrame.query method and it almost does exactly what I needed it to (and implemented my own parser for, since I hadn't realized it existed but really I should be using the standard method). I would like my users to be able to specify the query in a configuration file. The syntax seems intuitive enough that I can expect my non-programmer (but engineer) users to figure it out. There's just one thing missing: a way to select everything in the dataframe. Sometimes what my users want to use is every row, so they would put 'All' or something into that configuration option. In fact, that will be the default option. I tried df.query('True') but that raised a KeyError. I tried df.query('1') but that returned the row with index 1. The empty string raised a ValueError. The only things I can think of are 1) put an if clause every time I need to do this type of query (probably 3 or 4 times in the code) or 2) subclass DataFrame and either reimplement query, or add a query_with_all method: <pre class="prettyprint"><code>import pandas as pd class MyDataFrame(pd.DataFrame): def query_with_all(self, query_string): if query_string.lower() == 'all': return self else: return self.query(query_string) </code></pre> And then use my own class every time instead of the pandas one. Is this the only way to do this?

Keep things simple, and use a function: <pre class="prettyprint"><code>def query_with_all(data_frame, query_string): if query_string == "all": return data_frame return data_frame.query(query_string) </code></pre> Whenever you need to use this type of query, just call the function with the data frame and the query string. There's no need to use any extra <code>if</code> statements or subclass <code>pd.Dataframe</code>. <hr> If you're restricted to using <code>df.query</code>, you can use a global variable <pre class="prettyprint"><code>ALL = slice(None) df.query('@ALL', engine='python') </code></pre> If you're not allowed to use global variables, and if your DataFrame isn't MultiIndexed, you can use <pre class="prettyprint"><code>df.query('tuple()') </code></pre> All of these will property handle <code>NaN</code> values.

pandas DataFrame.query expression that returns all rows by default

Tags:

python

pandas

dataframe

I have discovered the pandas DataFrame.query method and it almost does exactly what I needed it to (and implemented my own parser for, since I hadn't realized it existed but really I should be using the standard method).

I would like my users to be able to specify the query in a configuration file. The syntax seems intuitive enough that I can expect my non-programmer (but engineer) users to figure it out.

There's just one thing missing: a way to select everything in the dataframe. Sometimes what my users want to use is every row, so they would put 'All' or something into that configuration option. In fact, that will be the default option.

I tried df.query('True') but that raised a KeyError. I tried df.query('1') but that returned the row with index 1. The empty string raised a ValueError.

The only things I can think of are 1) put an if clause every time I need to do this type of query (probably 3 or 4 times in the code) or 2) subclass DataFrame and either reimplement query, or add a query_with_all method:

import pandas as pd

class MyDataFrame(pd.DataFrame):
    def query_with_all(self, query_string):
        if query_string.lower() == 'all':
            return self
        else:
            return self.query(query_string)

And then use my own class every time instead of the pandas one. Is this the only way to do this?

383

asked Oct 19 '17 03:10

moink

2 Answers

Keep things simple, and use a function:

def query_with_all(data_frame, query_string):
    if query_string == "all":
        return data_frame
    return data_frame.query(query_string)

Whenever you need to use this type of query, just call the function with the data frame and the query string. There's no need to use any extra if statements or subclass pd.Dataframe.

If you're restricted to using df.query, you can use a global variable

ALL = slice(None)
df.query('@ALL', engine='python')

If you're not allowed to use global variables, and if your DataFrame isn't MultiIndexed, you can use

df.query('tuple()')

All of these will property handle NaN values.

195

answered Oct 20 '22 00:10

Joshua

df.query('ilevel_0 in ilevel_0') will always return the full dataframe, also when the index contains NaN values or even when the dataframe is completely empty.

In you particular case you could then define a global variable all_true = 'ilevel_0 in ilevel_0' (as suggested in the comments by Zero) so that your engineers could use the name of the global variable in their config file instead.

This statement is just a dirty way to properly query True like you already tried. ilevel_0 is a more formal way of making sure you are referring the index. See the docs here for more details on using in and ilevel_0: https://pandas.pydata.org/pandas-docs/stable/indexing.html#the-query-method

answered Oct 20 '22 00:10

jorijnsmit

Related questions
                            
                                combine different seaborn facet grids into single plot
                            
                                Django admin unregister Sites
                            
                                Python/R: generate dataframe from XML when not all nodes contain all variables?
                            
                                What's the best way to downsample a numpy array?
                            
                                Python multiprocessing: How to close the multiprocessing pool on exception
                            
                                Find indexes of repeated elements in an array (Python, NumPy)
                            
                                Output audio file not created correctly, or has unknown duration time
                            
                                Python convert True False matrix to image
                            
                                Fastest way to find non-finite values
                            
                                How can dynamically create permission in django?
                            
                                Check if NaN in Tensorflow
                            
                                Error using sklearn and linear regression: shapes (1,16) and (1,1) not aligned: 16 (dim 1) != 1 (dim 0)
                            
                                What is a module variable vs. a global variable?
                            
                                The order of axis when printing a NumPy array
                            
                                Why is the dtype shown (even if it's the native one) when using floor division with NumPy?
                            
                                Poor performance of C++ function in Cython
                            
                                Python Library to create and visualize HyperGraph
                            
                                isinstance() unexpectedly returning False
                            
                                How to catch exceptions in a python run_in_executor method call
                            
                                Create package with cython so users can install it without having cython already installed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With