I know that logical AND is &, and logical OR is | in a Pandas Series, but I was looking for an element-wise logical XOR. I could express it in terms of AND and OR, I suppose, but I'd prefer to use an XOR if one is available.
Thank you!
In Python, we can perform the bitwise XOR operation using the "^" symbol. The XOR operation can be used for different purposes; XOR of two integers, XOR of two booleans, Swapping two numbers using XOR, etc. We can also use the xor() function using the operator module in Python.
This is because bitwise operators have a higher precedence than comparison operators, meaning that the bitwise operation will precede that of the comparison. Just how we'd write the mathematical expression too in this case.
You can filter out rows with NAN value from pandas DataFrame column string, float, datetime e.t.c by using DataFrame. dropna() and DataFrame. notnull() methods. Python doesn't support Null hence any missing data is represented as None or NaN.
Pandas provides three operators: & for logical AND, | for logical OR, and ~ for logical NOT.
Python XOR: a ^ b
Numpy logical XOR: np.logical_xor(a,b)
Testing performance - result are equal:
1. Sequence of random booleans with size 10000
In [7]: a = np.random.choice([True, False], size=10000)
In [8]: b = np.random.choice([True, False], size=10000)
In [9]: %timeit a ^ b
The slowest run took 7.61 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop
In [10]: %timeit np.logical_xor(a,b)
The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop
2. Sequence of random booleans with size 1000
In [11]: a = np.random.choice([True, False], size=1000)
In [12]: b = np.random.choice([True, False], size=1000)
In [13]: %timeit a ^ b
The slowest run took 21.52 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop
In [14]: %timeit np.logical_xor(a,b)
The slowest run took 19.45 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop
3. Sequence of random booleans with size 100
In [15]: a = np.random.choice([True, False], size=100)
In [16]: b = np.random.choice([True, False], size=100)
In [17]: %timeit a ^ b
The slowest run took 33.43 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 614 ns per loop
In [18]: %timeit np.logical_xor(a,b)
The slowest run took 45.49 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 616 ns per loop
4. Sequence of random booleans with size 10
In [19]: a = np.random.choice([True, False], size=10)
In [20]: b = np.random.choice([True, False], size=10)
In [21]: %timeit a ^ b
The slowest run took 86.10 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 509 ns per loop
In [22]: %timeit np.logical_xor(a,b)
The slowest run took 40.94 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 511 ns per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With