I am comfortable with basic filtering and querying using Pandas. For example, if I have a dataframe called df
I can do df[df['field1'] < 2]
or df[df['field2'] < 3]
. I can also chain multiple criteria together, for example:
df[(df['field1'] < 3) & (df['field2'] < 2)]
.
What if I don't know in advance how many criteria I will need to use? Is there a way to "chain" an arbitrary number of these operations together? I would like to pass a list of filters such as [('field1', 3), ('field2', 2), ('field3', 4)]
which would result in the chaining of these 3 conditions together.
Thanks!
The pandas Series objects have the less than, greater than, etc operations as methods you can call. So df['field1'] < 3
becomes df['field1'].lt(3)
. This is not terribly important, but it makes the code more readable.
To implement what you are asking, you can use the reduce
function from functools, and the and_
(equivalent of &
) from the operator package.
from functools import reduce
from operator import and_
reduce(and_, (df.field1.lt(3), df.field2.lt(2), df.field3.lt(4)))
Approach 1
Use pd.DataFrame.query
c = [('field1', 3), ('field2', 2), ('field3', 4)]
f = '{0[0]} < {0[1]}'.format
df.query(' & '.join(f(t) for t in c))
Approach 2
c = [('field1', 3), ('field2', 2), ('field3', 4)]
df[df[[t[0] for t in c]].lt([t[1] for t in c]).all(1)]
Approach 3
Create a pd.Series
out of c
and then compare
c = [('field1', 3), ('field2', 2), ('field3', 4)]
s = pd.Series(dict(c))
df[df[s.index].lt(s).all(1)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With