Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas filter rows based on multiple conditions

I have some values in the risk column that are neither, Small, Medium or High. I want to delete the rows with the value not being Small, Medium and High. I tried the following:

df = df[(df.risk == "Small") | (df.risk == "Medium") | (df.risk == "High")]

But this returns an empty DataFrame. How can I filter them correctly?

like image 949
ArtDijk Avatar asked Apr 27 '14 13:04

ArtDijk


People also ask

How do I use multiple conditions in pandas?

Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.

How do I filter out rows in pandas DataFrame?

You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.

Is Iterrows faster than apply?

This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes. See below for an example of how we could use apply for labeling the species in each row.


2 Answers

I think you want:

df = df[(df.risk.isin(["Small","Medium","High"]))]

Example:

In [5]:
import pandas as pd
df = pd.DataFrame({'risk':['Small','High','Medium','Negligible', 'Very High']})
df

Out[5]:

         risk
0       Small
1        High
2      Medium
3  Negligible
4   Very High

[5 rows x 1 columns]

In [6]:

df[df.risk.isin(['Small','Medium','High'])]

Out[6]:

     risk
0   Small
1    High
2  Medium

[3 rows x 1 columns]
like image 80
EdChum Avatar answered Sep 20 '22 15:09

EdChum


Another nice and readable approach is the following:

small_risk = df["risk"] == "Small"
medium_risk = df["risk"] == "Medium"
high_risk = df["risk"] == "High"

Then you can use it like this:

df[small_risk | medium_risk | high_risk]

or

df[small_risk & medium_risk]
like image 24
Rafael Avatar answered Sep 21 '22 15:09

Rafael