Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a list of conditions to filter a DataFrame in Pandas

I wish to have a function which takes a list of conditions, of any length, and places an ampersand between all the conditions. Example code below.

df = pd.DataFrame(columns=['Sample', 'DP','GQ', 'AB'],
         data=[
               ['HG_12_34', 200, 35, 0.4],
               ['HG_12_34_2', 50, 45, 0.9],
               ['KD_89_9', 76, 67, 0.7],
               ['KD_98_9_2', 4, 78, 0.02],
               ['LG_3_45', 90, 3, 0.8],
               ['LG_3_45_2', 15, 12, 0.9]
               ])


def some_func(df, cond_list):

    # wrap ampersand between multiple conditions
    all_conds = ?

    return df[all_conds]

cond1 = df['DP'] > 40
cond2 = df['GQ'] > 40
cond3 = df['AB'] < 0.4


some_func(df, [cond1, cond2]) # should return df[cond1 & cond2]
some_func(df, [cond1, cond3, cond2]) # should return df[cond1 & cond3 & cond2]

I would appreciate any help with this.

like image 643
David Ross Avatar asked Apr 01 '17 18:04

David Ross


People also ask

How do you filter a DataFrame in multiple conditions?

Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.

How do I filter data based on conditions in Pandas?

You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame.

How do you filter a DataFrame in Python based on multiple column values?

To filter pandas DataFrame by multiple columns. When we filter a DataFrame by one column, we simply compare that column values against a specific condition but when it comes to filtering of DataFrame by multiple columns, we need to use the AND (&&) Operator to match multiple columns with multiple conditions.

How do you select rows of Pandas DataFrame based on values in a list?

isin() to Select Rows From List of Values. DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.


1 Answers

You can use functools.reduce for that:

from functools import reduce

def some_func(df, cond_list):
    return df[reduce(lambda x,y: x&y, cond_list)]

Or, like @AryaMcCarthy says, you can use and_ from the operator package:

from functools import reduce
from operator import and_

def some_func(df, cond_list):
    return df[reduce(and_, cond_list)]

or with numpy - like @ayhan says - which has also a logical and reduction:

from numpy import logical_and

def some_func(df, cond_list):
    return df[logical_and.reduce(cond_list)]

All three versions produce - for your sample input - the following output:

>>> some_func(df, [cond1, cond2])
       Sample  DP  GQ   AB
1  HG_12_34_2  50  45  0.9
2     KD_89_9  76  67  0.7
>>> some_func(df, [cond1, cond2, cond3])
Empty DataFrame
Columns: [Sample, DP, GQ, AB]
Index: []
like image 130
Willem Van Onsem Avatar answered Sep 21 '22 19:09

Willem Van Onsem