Say I have a dataframe
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(10, size=(10,3)), columns=['a', 'b', 'c'])
if I now try to query it using the query
method:
this works:
df.query('''a > 3 and b < 9''')
this throws an error:
df.query(
'''
a > 3 and
b < 9
'''
)
I tried many variations of multiline strings but the result is always the following error:
~/ven/lib/python3.6/site-packages/pandas/core/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
306 if multi_line and target is None:
307 raise ValueError(
--> 308 "multi-line expressions are only valid in the "
309 "context of data, use DataFrame.eval"
310 )
ValueError: multi-line expressions are only valid in the context of data, use DataFrame.eval
Does anyone know how to make it work?
The problem is that in reality I have a very long query to do and it would be very inconvenient having to write all in one line.
I know I could use boolean indexing instead but my question is only about how to use multiline with the query
method.
Thank you
You can select the Rows from Pandas DataFrame based on column values or based on multiple conditions either using DataFrame. loc[] attribute, DataFrame. query() or DataFrame. apply() method to use lambda function.
Selecting rows with logical operators i.e. AND and OR can be achieved easily with a combination of >, <, <=, >= and == to extract rows with multiple filters. loc() is primarily label based, but may also be used with a boolean array to access a group of rows and columns by label or a boolean array.
loc . To select a single value from the DataFrame, you can do the following. You can use slicing to select a particular column. To select rows and columns simultaneously, you need to understand the use of comma in the square brackets.
Use multi-line char backslash ( \
)
Ex:
df = pd.DataFrame(np.random.randint(10, size=(10,3)), columns=['a', 'b', 'c'])
print(df.query(
'''
a > 3 and \
b < 9
'''
))
You can remove the new line character \n
to allow multiline query
query_multiline = '''
a > 3 and
b < 9
'''
query_multiline = query_multiline.replace('\n', '')
df.query(query_multiline)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With