Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas slicing/selecting with multiple conditions with or statement

When I select by chaining different conditions with "AND" the selection works fine. When I select by chaining conditions with "OR" the selection throws an error.

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([[1,4,3],[2,3,5],[4,5,6],[3,2,5]], 
...     columns=['a', 'b', 'c'])
>>> df
   a  b  c
0  1  4  3
1  2  3  5
2  4  5  6
3  3  2  5
>>> df.loc[(df.a != 1) & (df.b < 5)]
   a  b  c
1  2  3  5
3  3  2  5
>>> df.loc[(df.a != 1) or (df.b < 5)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 731, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I would expect it to return the whole dataframe as all rows meet this condition.

like image 369
jtorca Avatar asked Feb 07 '17 05:02

jtorca


People also ask

How do I use multiple conditions in Pandas?

Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.

Can I use ILOC and LOC together?

loc and iloc are interchangeable when labels are 0-based integers.


1 Answers

The important thing to note is that & is not identical to and; they are different things so the "or" equivalent to & is |

Normally both & and | are bitwise logical operators rather than the python "logical" operators.

In pandas these operators are overloaded for Series operation.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame([[1,4,3],[2,3,5],[4,5,6],[3,2,5]], columns=['a', 'b',
   ...:  'c'])

In [4]: df
Out[4]:
   a  b  c
0  1  4  3
1  2  3  5
2  4  5  6
3  3  2  5

In [5]: df.loc[(df.a != 1) & (df.b < 5)]
Out[5]:
   a  b  c
1  2  3  5
3  3  2  5

In [6]: df.loc[(df.a != 1) | (df.b < 5)]
Out[6]:
   a  b  c
0  1  4  3
1  2  3  5
2  4  5  6
3  3  2  5
like image 103
Steve Barnes Avatar answered Oct 14 '22 05:10

Steve Barnes