I wish to have a function which takes a list of conditions, of any length, and places an ampersand between all the conditions. Example code below.
df = pd.DataFrame(columns=['Sample', 'DP','GQ', 'AB'],
data=[
['HG_12_34', 200, 35, 0.4],
['HG_12_34_2', 50, 45, 0.9],
['KD_89_9', 76, 67, 0.7],
['KD_98_9_2', 4, 78, 0.02],
['LG_3_45', 90, 3, 0.8],
['LG_3_45_2', 15, 12, 0.9]
])
def some_func(df, cond_list):
# wrap ampersand between multiple conditions
all_conds = ?
return df[all_conds]
cond1 = df['DP'] > 40
cond2 = df['GQ'] > 40
cond3 = df['AB'] < 0.4
some_func(df, [cond1, cond2]) # should return df[cond1 & cond2]
some_func(df, [cond1, cond3, cond2]) # should return df[cond1 & cond3 & cond2]
I would appreciate any help with this.
Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.
You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame.
To filter pandas DataFrame by multiple columns. When we filter a DataFrame by one column, we simply compare that column values against a specific condition but when it comes to filtering of DataFrame by multiple columns, we need to use the AND (&&) Operator to match multiple columns with multiple conditions.
isin() to Select Rows From List of Values. DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.
You can use functools.reduce
for that:
from functools import reduce
def some_func(df, cond_list):
return df[reduce(lambda x,y: x&y, cond_list)]
Or, like @AryaMcCarthy says, you can use and_
from the operator package:
from functools import reduce
from operator import and_
def some_func(df, cond_list):
return df[reduce(and_, cond_list)]
or with numpy - like @ayhan says - which has also a logical and reduction:
from numpy import logical_and
def some_func(df, cond_list):
return df[logical_and.reduce(cond_list)]
All three versions produce - for your sample input - the following output:
>>> some_func(df, [cond1, cond2])
Sample DP GQ AB
1 HG_12_34_2 50 45 0.9
2 KD_89_9 76 67 0.7
>>> some_func(df, [cond1, cond2, cond3])
Empty DataFrame
Columns: [Sample, DP, GQ, AB]
Index: []
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With