I have a df, and I want to run something like:
subsetdf= df.loc[(df['Item_Desc'].str.contains('X')==True) or \
(df['Item_Desc'].str.contains('Y')==True ),:]
that selects all rows that have the Item Desc column a substring of "X" or "Y".
The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I get the error when I run that. Any help?
You can get pandas. Series of bool which is an AND of two conditions using & . Note that == and ~ are used here as the second condition for the sake of explanation, but you can use !=
You can select the Rows from Pandas DataFrame based on column values or based on multiple conditions either using DataFrame. loc[] attribute, DataFrame. query() or DataFrame. apply() method to use lambda function.
Use |
instead of or
. So:
df.loc[(cond1) | (cond2), :]
The or
operator wants to compare two boolean values (or two expression that evaluate to True or False). But a Series (or numpy array) does not simply evaluates to True or False, and in this case we want to compare both series element-wise. For this you can use |
which is called 'bitwise or'.
Pandas follows here the numpy conventions. See here in the pandas docs for an explanation on it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With