I have a Spark DataFrame that has 2 columns, I am trying to create a new column using the other two columns with the when otherwise operation.
df_newcol = df.withColumn("Flag", when(col("a") <= lit(ratio1) | col("b") <= lit(ratio1), 1).otherwise(2))
But this throws an error
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
I have used when and otherwise previously with one column, while using it with multiple columns do we have to write the logic differently.
Thanks.
You have an operator precedence issue, make sure you put comparison operators in parenthesis when the comparison is mixed with logical operators such as &
and |
, with which being fixed, you don't even need lit
, a scalar should work as well:
import pyspark.sql.functions as F
df = spark.createDataFrame([[1, 2], [2, 3], [3, 4]], ['a', 'b'])
Both of the following should work:
df.withColumn('flag', F.when((F.col("a") <= F.lit(2)) | (F.col("b") <= F.lit(2)), 1).otherwise(2)).show()
+---+---+----+
| a| b|flag|
+---+---+----+
| 1| 2| 1|
| 2| 3| 1|
| 3| 4| 2|
+---+---+----+
df.withColumn('flag', F.when((F.col("a") <= 2) | (F.col("b") <= 2), 1).otherwise(2)).show()
+---+---+----+
| a| b|flag|
+---+---+----+
| 1| 2| 1|
| 2| 3| 1|
| 3| 4| 2|
+---+---+----+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With