Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas dataframe multiline query

Say I have a dataframe

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(10, size=(10,3)), columns=['a', 'b', 'c'])

if I now try to query it using the query method:

this works:

df.query('''a > 3 and b < 9''')

this throws an error:

df.query(
    '''
        a > 3 and
        b < 9
    '''
)

I tried many variations of multiline strings but the result is always the following error:

~/ven/lib/python3.6/site-packages/pandas/core/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
    306     if multi_line and target is None:
    307         raise ValueError(
--> 308             "multi-line expressions are only valid in the "
    309             "context of data, use DataFrame.eval"
    310         )

ValueError: multi-line expressions are only valid in the context of data, use DataFrame.eval

Does anyone know how to make it work? The problem is that in reality I have a very long query to do and it would be very inconvenient having to write all in one line. I know I could use boolean indexing instead but my question is only about how to use multiline with the query method.

Thank you

like image 808
gioxc88 Avatar asked Aug 26 '20 12:08

gioxc88


People also ask

How do you select multiple values in a data frame?

You can select the Rows from Pandas DataFrame based on column values or based on multiple conditions either using DataFrame. loc[] attribute, DataFrame. query() or DataFrame. apply() method to use lambda function.

How do I extract multiple rows from a DataFrame in Python?

Selecting rows with logical operators i.e. AND and OR can be achieved easily with a combination of >, <, <=, >= and == to extract rows with multiple filters. loc() is primarily label based, but may also be used with a boolean array to access a group of rows and columns by label or a boolean array.

How do I select multiple rows in Loc?

loc . To select a single value from the DataFrame, you can do the following. You can use slicing to select a particular column. To select rows and columns simultaneously, you need to understand the use of comma in the square brackets.


2 Answers

Use multi-line char backslash ( \ )

Ex:

df = pd.DataFrame(np.random.randint(10, size=(10,3)), columns=['a', 'b', 'c'])
print(df.query(
    '''
        a > 3 and \
        b < 9
    '''
))
like image 151
Rakesh Avatar answered Oct 16 '22 18:10

Rakesh


You can remove the new line character \n to allow multiline query

query_multiline = '''
  a > 3 and
  b < 9
'''

query_multiline = query_multiline.replace('\n', '')

df.query(query_multiline)    
like image 25
Tiago Wutzke de Oliveira Avatar answered Oct 16 '22 19:10

Tiago Wutzke de Oliveira