I am trying to filter a df using several Boolean variables that are a part of the df, but have been unable to do so.
Sample data:
A | B | C | D John Doe | 45 | True | False Jane Smith | 32 | False | False Alan Holmes | 55 | False | True Eric Lamar | 29 | True | True
The dtype for columns C and D is Boolean. I want to create a new df (df1) with only the rows where either C or D is True. It should look like this:
A | B | C | D John Doe | 45 | True | False Alan Holmes | 55 | False | True Eric Lamar | 29 | True | True
I've tried something like this, which faces issues because it cant handle the Boolean type:
df1 = df[(df['C']=='True') or (df['D']=='True')]
Any ideas?
Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.
Using query() to Filter by Column Value in pandas DataFrame. query() function is used to filter rows based on column value in pandas. After applying the expression, it returns a new DataFrame.
In [82]: d Out[82]: A B C D 0 John Doe 45 True False 1 Jane Smith 32 False False 2 Alan Holmes 55 False True 3 Eric Lamar 29 True True
Solution 1:
In [83]: d.loc[d.C | d.D] Out[83]: A B C D 0 John Doe 45 True False 2 Alan Holmes 55 False True 3 Eric Lamar 29 True True
Solution 2:
In [94]: d[d[['C','D']].any(1)] Out[94]: A B C D 0 John Doe 45 True False 2 Alan Holmes 55 False True 3 Eric Lamar 29 True True
Solution 3:
In [95]: d.query("C or D") Out[95]: A B C D 0 John Doe 45 True False 2 Alan Holmes 55 False True 3 Eric Lamar 29 True True
PS If you change your solution to:
df[(df['C']==True) | (df['D']==True)]
it'll work too
Pandas docs - boolean indexing
why we should NOT use "PEP complaint"
df["col_name"] is True
instead ofdf["col_name"] == True
?
In [11]: df = pd.DataFrame({"col":[True, True, True]}) In [12]: df Out[12]: col 0 True 1 True 2 True In [13]: df["col"] is True Out[13]: False # <----- oops, that's not exactly what we wanted
Hooray! More options!
np.where
df[np.where(df.C | df.D, True, False)] A B C D 0 John Doe 45 True False 2 Alan Holmes 55 False True 3 Eric Lamar 29 True True
pd.Series.where
on df.index
df.loc[df.index.where(df.C | df.D).dropna()] A B C D 0.0 John Doe 45 True False 2.0 Alan Holmes 55 False True 3.0 Eric Lamar 29 True True
df.select_dtypes
df[df.select_dtypes([bool]).any(1)] A B C D 0 John Doe 45 True False 2 Alan Holmes 55 False True 3 Eric Lamar 29 True True
np.select
df.iloc[np.select([df.C | df.D], [df.index])].drop_duplicates() A B C D 0 John Doe 45 True False 2 Alan Holmes 55 False True 3 Eric Lamar 29 True True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With