Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Filtering multiple conditions

Tags:

python

pandas

I'm trying to do boolean indexing with a couple conditions using Pandas. My original DataFrame is called df. If I perform the below, I get the expected result:

temp = df[df["bin"] == 3]
temp = temp[(~temp["Def"])]
temp = temp[temp["days since"] > 7]
temp.head()

However, if I do this (which I think should be equivalent), I get no rows back:

temp2 = df[df["bin"] == 3]
temp2 = temp2[~temp2["Def"] & temp2["days since"] > 7]
temp2.head()

Any idea what accounts for the difference?

like image 859
anon_swe Avatar asked Feb 25 '18 21:02

anon_swe


People also ask

How do I use multiple conditions in pandas?

Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.

How do you filter by condition in python?

Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.


4 Answers

Use () because operator precedence:

temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]

Alternatively, create conditions on separate rows:

cond1 = df["bin"] == 3    
cond2 = df["days since"] > 7
cond3 = ~df["Def"]

temp2 = df[cond1 & cond2 & cond3]

Sample:

df = pd.DataFrame({'Def':[True] *2 + [False]*4,
                   'days since':[7,8,9,14,2,13],
                   'bin':[1,3,5,3,3,3]})

print (df)
     Def  bin  days since
0   True    1           7
1   True    3           8
2  False    5           9
3  False    3          14
4  False    3           2
5  False    3          13


temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]
print (temp2)
     Def  bin  days since
3  False    3          14
5  False    3          13
like image 154
jezrael Avatar answered Oct 01 '22 22:10

jezrael


OR

 df_train[(df_train["fold"]==1) | (df_train["fold"]==2)]

AND

 df_train[(df_train["fold"]==1) & (df_train["fold"]==2)]
like image 21
RAHUL KUMAR Avatar answered Oct 01 '22 22:10

RAHUL KUMAR


Alternatively, you can use the method query:

df.query('not Def & (`days since` > 7) & (bin == 3)')
like image 22
Mykola Zotko Avatar answered Oct 01 '22 23:10

Mykola Zotko


If you want multiple conditions:

Del_Det_5k_top_10 = Del_Det[(Del_Det['State'] == 'NSW') & (Del_Det['route'] == 2) |
                            (Del_Det['State'] == 'VIC') & (Del_Det['route'] == 3)]
like image 30
Loku Avatar answered Oct 01 '22 22:10

Loku