Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter Pandas Dataframe using an arbitrary number of conditions

Tags:

python

pandas

I am comfortable with basic filtering and querying using Pandas. For example, if I have a dataframe called df I can do df[df['field1'] < 2] or df[df['field2'] < 3]. I can also chain multiple criteria together, for example:

df[(df['field1'] < 3) & (df['field2'] < 2)].

What if I don't know in advance how many criteria I will need to use? Is there a way to "chain" an arbitrary number of these operations together? I would like to pass a list of filters such as [('field1', 3), ('field2', 2), ('field3', 4)] which would result in the chaining of these 3 conditions together.

Thanks!

like image 475
speedyturkey Avatar asked Dec 14 '22 19:12

speedyturkey


2 Answers

The pandas Series objects have the less than, greater than, etc operations as methods you can call. So df['field1'] < 3 becomes df['field1'].lt(3). This is not terribly important, but it makes the code more readable.

To implement what you are asking, you can use the reduce function from functools, and the and_ (equivalent of &) from the operator package.

from functools import reduce
from operator import and_

reduce(and_, (df.field1.lt(3), df.field2.lt(2), df.field3.lt(4)))
like image 63
James Avatar answered May 21 '23 15:05

James


Approach 1
Use pd.DataFrame.query

c = [('field1', 3), ('field2', 2), ('field3', 4)]
f = '{0[0]} < {0[1]}'.format

df.query(' & '.join(f(t) for t in c))

Approach 2

c = [('field1', 3), ('field2', 2), ('field3', 4)]

df[df[[t[0] for t in c]].lt([t[1] for t in c]).all(1)]

Approach 3
Create a pd.Series out of c and then compare

c = [('field1', 3), ('field2', 2), ('field3', 4)]

s = pd.Series(dict(c))

df[df[s.index].lt(s).all(1)]
like image 40
piRSquared Avatar answered May 21 '23 15:05

piRSquared