Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Element-wise XOR in pandas

I know that logical AND is &, and logical OR is | in a Pandas Series, but I was looking for an element-wise logical XOR. I could express it in terms of AND and OR, I suppose, but I'd prefer to use an XOR if one is available.

Thank you!

like image 470
SulfoCyaNate Avatar asked Aug 26 '15 22:08

SulfoCyaNate


People also ask

Can you do XOR in Python?

In Python, we can perform the bitwise XOR operation using the "^" symbol. The XOR operation can be used for different purposes; XOR of two integers, XOR of two booleans, Swapping two numbers using XOR, etc. We can also use the xor() function using the operator module in Python.

Why do pandas use Bitwise Operators?

This is because bitwise operators have a higher precedence than comparison operators, meaning that the bitwise operation will precede that of the comparison. Just how we'd write the mathematical expression too in this case.

How do you filter blank values in Python?

You can filter out rows with NAN value from pandas DataFrame column string, float, datetime e.t.c by using DataFrame. dropna() and DataFrame. notnull() methods. Python doesn't support Null hence any missing data is represented as None or NaN.

What is or operator in pandas DataFrame?

Pandas provides three operators: & for logical AND, | for logical OR, and ~ for logical NOT.


1 Answers

Python XOR: a ^ b

Numpy logical XOR: np.logical_xor(a,b)

Testing performance - result are equal:

1. Sequence of random booleans with size 10000

In [7]: a = np.random.choice([True, False], size=10000)
In [8]: b = np.random.choice([True, False], size=10000)

In [9]: %timeit a ^ b
The slowest run took 7.61 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop

In [10]: %timeit np.logical_xor(a,b)
The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 11 us per loop

2. Sequence of random booleans with size 1000

In [11]: a = np.random.choice([True, False], size=1000)
In [12]: b = np.random.choice([True, False], size=1000)

In [13]: %timeit a ^ b
The slowest run took 21.52 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop

In [14]: %timeit np.logical_xor(a,b)
The slowest run took 19.45 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 1.58 us per loop

3. Sequence of random booleans with size 100

In [15]: a = np.random.choice([True, False], size=100)
In [16]: b = np.random.choice([True, False], size=100)

In [17]: %timeit a ^ b
The slowest run took 33.43 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 614 ns per loop

In [18]: %timeit np.logical_xor(a,b)
The slowest run took 45.49 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 616 ns per loop

4. Sequence of random booleans with size 10

In [19]: a = np.random.choice([True, False], size=10)
In [20]: b = np.random.choice([True, False], size=10)

In [21]: %timeit a ^ b
The slowest run took 86.10 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 509 ns per loop

In [22]: %timeit np.logical_xor(a,b)
The slowest run took 40.94 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 511 ns per loop
like image 184
jezrael Avatar answered Sep 28 '22 10:09

jezrael