Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering pandas dataframe with multiple Boolean columns

I am trying to filter a df using several Boolean variables that are a part of the df, but have been unable to do so.

Sample data:

A | B | C | D John Doe | 45 | True | False Jane Smith | 32 | False | False Alan Holmes | 55 | False | True Eric Lamar | 29 | True | True 

The dtype for columns C and D is Boolean. I want to create a new df (df1) with only the rows where either C or D is True. It should look like this:

A | B | C | D John Doe | 45 | True | False Alan Holmes | 55 | False | True Eric Lamar | 29 | True | True 

I've tried something like this, which faces issues because it cant handle the Boolean type:

df1 = df[(df['C']=='True') or (df['D']=='True')] 

Any ideas?

like image 998
Maya Harary Avatar asked Sep 13 '17 22:09

Maya Harary


People also ask

How do you filter a DataFrame in multiple conditions?

Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.

How do you filter a DataFrame based on column values?

Using query() to Filter by Column Value in pandas DataFrame. query() function is used to filter rows based on column value in pandas. After applying the expression, it returns a new DataFrame.


Video Answer


2 Answers

In [82]: d Out[82]:              A   B      C      D 0     John Doe  45   True  False 1   Jane Smith  32  False  False 2  Alan Holmes  55  False   True 3   Eric Lamar  29   True   True 

Solution 1:

In [83]: d.loc[d.C | d.D] Out[83]:              A   B      C      D 0     John Doe  45   True  False 2  Alan Holmes  55  False   True 3   Eric Lamar  29   True   True 

Solution 2:

In [94]: d[d[['C','D']].any(1)] Out[94]:              A   B      C      D 0     John Doe  45   True  False 2  Alan Holmes  55  False   True 3   Eric Lamar  29   True   True 

Solution 3:

In [95]: d.query("C or D") Out[95]:              A   B      C      D 0     John Doe  45   True  False 2  Alan Holmes  55  False   True 3   Eric Lamar  29   True   True 

PS If you change your solution to:

df[(df['C']==True) | (df['D']==True)] 

it'll work too

Pandas docs - boolean indexing


why we should NOT use "PEP complaint" df["col_name"] is True instead of df["col_name"] == True?

In [11]: df = pd.DataFrame({"col":[True, True, True]})  In [12]: df Out[12]:     col 0  True 1  True 2  True  In [13]: df["col"] is True Out[13]: False               # <----- oops, that's not exactly what we wanted 
like image 117
MaxU - stop WAR against UA Avatar answered Sep 19 '22 14:09

MaxU - stop WAR against UA


Hooray! More options!

np.where

df[np.where(df.C | df.D, True, False)]               A   B      C      D 0     John Doe  45   True  False 2  Alan Holmes  55  False   True 3   Eric Lamar  29   True   True   

pd.Series.where on df.index

df.loc[df.index.where(df.C | df.D).dropna()]                 A   B      C      D 0.0     John Doe  45   True  False 2.0  Alan Holmes  55  False   True 3.0   Eric Lamar  29   True   True 

df.select_dtypes

df[df.select_dtypes([bool]).any(1)]                  A   B      C      D 0     John Doe  45   True  False 2  Alan Holmes  55  False   True 3   Eric Lamar  29   True   True 

Abusing np.select

df.iloc[np.select([df.C | df.D], [df.index])].drop_duplicates()               A   B      C      D 0     John Doe  45   True  False 2  Alan Holmes  55  False   True 3   Eric Lamar  29   True   True 
like image 41
cs95 Avatar answered Sep 17 '22 14:09

cs95