I understand the pandas docs explain that this is the convention, but I was wondering why?
For example:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6,4), index=list('abcdef'), columns=list('ABCD'))
print(df[(df.A < .5) | (df.B > .5)])
print(df[(df.A < .5) or (df.B > .5)])
Returns the following:
A B C D
a -0.284669 -0.277413 2.524311 -1.386008
b -0.190307 0.325620 -0.367727 0.347600
c -0.763290 -0.108921 -0.467167 1.387327
d -0.241815 -0.869941 -0.756848 -0.335477
e -1.583691 -0.236361 -1.007421 0.298171
f -3.173293 0.521770 -0.326190 1.604712
Traceback (most recent call last):
File "C:\test.py", line 64, in <module>
print(df[(df.A < .5) or (df.B > .5)])
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
It's important to realize that you cannot use any of the Python logical operators ( and , or or not ) on pandas.
Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns.
Because &
and |
are overridable (customizable). You can write the code that drives the operators for any class
.
The logic operators and
and or
, on the other hand, have standard behavior that cannot be modified.
See here for the relevant documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With