Is there a logical XOR expression available in pyspark?
The documentation here says to use ^ but I am getting the following error when I try doing it.
Boolean operations on pyspark.sql.column.Column objects allowing you to use &, |, and ~ are defined here. As you can see unfortunately XOR operation is not among defined ones.
Thus:
df = spark.range(1000)
df.where((df.id >= 20) & (df.id <= 40)) # will be ok
df.where((df.id >= 20) ^ (df.id <= 40)) # will result in error
You can however write something like this:
df.where((df.id >= 10).cast('int')\
.bitwiseXOR((df.id <= 90).cast('int')).cast('boolean')).show()
In case you are extremely unhappy with the lack of (df.id >= 20) ^ (df.id <= 40) you can use something like the following at your own risk (I don't recommend doing it):
import pyspark
df = spark.range(50)
pyspark.sql.column.Column.__xor__ = lambda x, y: x.cast('int').bitwiseXOR(y.cast('int')).cast('boolean')
df.where((df.id >= 10) ^ (df.id <= 40)).show()
Nothing in that link states that you should use ^, the closet occurrence seems to be:
>>> from pyspark.sql import Row
>>> df = spark.createDataFrame([Row(a=170, b=75)])
>>> df.select(df.a.bitwiseXOR(df.b)).collect()
[Row((a ^ b)=225)]
Other occurrences of ^ are used for both regular expressions and exponents so are unrelated.
That particular case is showing a result with the ^ character in it, but it's very much telling you to use bitwiseXOR(). However, there's a big difference between bitwise and logical or operations (unless enacted only on bit values of zero and one in an environment that conflates the two, of course).
And, given that the only occurrences of "logical" on that page have nothing to do with logical operators, the operation appear to not be available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With