Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting rows of a dataframe based on two conditions in Pandas python

Tags:

python

pandas

I have a df, and I want to run something like:

subsetdf= df.loc[(df['Item_Desc'].str.contains('X')==True) or \
                 (df['Item_Desc'].str.contains('Y')==True ),:]

that selects all rows that have the Item Desc column a substring of "X" or "Y".

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 

I get the error when I run that. Any help?

like image 428
wolfsatthedoor Avatar asked Aug 31 '14 23:08

wolfsatthedoor


People also ask

How can pandas select rows based on multiple conditions?

You can get pandas. Series of bool which is an AND of two conditions using & . Note that == and ~ are used here as the second condition for the sake of explanation, but you can use !=

How do you select rows of pandas DataFrame based on a multiple value of a column?

You can select the Rows from Pandas DataFrame based on column values or based on multiple conditions either using DataFrame. loc[] attribute, DataFrame. query() or DataFrame. apply() method to use lambda function.


1 Answers

Use | instead of or. So:

df.loc[(cond1) | (cond2), :]

The or operator wants to compare two boolean values (or two expression that evaluate to True or False). But a Series (or numpy array) does not simply evaluates to True or False, and in this case we want to compare both series element-wise. For this you can use | which is called 'bitwise or'.

Pandas follows here the numpy conventions. See here in the pandas docs for an explanation on it.

like image 93
joris Avatar answered Sep 28 '22 01:09

joris