I have a dataframe with a few columns. Now I want to derive a new column from 2 other columns:
from pyspark.sql import functions as F new_df = df.withColumn("new_col", F.when(df["col-1"] > 0.0 & df["col-2"] > 0.0, 1).otherwise(0))
With this I only get an exception:
py4j.Py4JException: Method and([class java.lang.Double]) does not exist
It works with just one condition like this:
new_df = df.withColumn("new_col", F.when(df["col-1"] > 0.0, 1).otherwise(0))
Does anyone know to use multiple conditions?
I'm using Spark 1.4.
when in pyspark multiple conditions can be built using &(for and) and | (for or). Save this answer.
PySpark Filter with Multiple Conditions In PySpark, to filter() rows on DataFrame based on multiple conditions, you case use either Column with a condition or SQL expression. Below is just a simple example using AND (&) condition, you can extend this with OR(|), and NOT(!) conditional expressions as needed.
1. Using when() otherwise() on PySpark DataFrame. PySpark when() is SQL function, in order to use this first you should import and this returns a Column type, otherwise() is a function of Column , when otherwise() not used and none of the conditions met it assigns None (Null) value. Usage would be like when(condition).
PySpark: withColumn() with two conditions and three outcomes df = df. withColumn('new_column', IF fruit1 == fruit2 THEN 1, ELSE 0. IF fruit1 IS NULL OR fruit2 IS NULL 3.)
Use parentheses to enforce the desired operator precedence:
F.when( (df["col-1"]>0.0) & (df["col-2"]>0.0), 1).otherwise(0)
when in pyspark multiple conditions can be built using &(for and) and | (for or), it is important to enclose every expressions within parenthesis that combine to form the condition
%pyspark dataDF = spark.createDataFrame([(66, "a", "4"), (67, "a", "0"), (70, "b", "4"), (71, "d", "4")], ("id", "code", "amt")) dataDF.withColumn("new_column", when((col("code") == "a") | (col("code") == "d"), "A") .when((col("code") == "b") & (col("amt") == "4"), "B") .otherwise("A1")).show()
when in spark scala can be used with && and || operator to build multiple conditions
//Scala val dataDF = Seq( (66, "a", "4"), (67, "a", "0"), (70, "b", "4"), (71, "d", "4" )).toDF("id", "code", "amt") dataDF.withColumn("new_column", when(col("code") === "a" || col("code") === "d", "A") .when(col("code") === "b" && col("amt") === "4", "B") .otherwise("A1")) .show()
Output:
+---+----+---+----------+ | id|code|amt|new_column| +---+----+---+----------+ | 66| a| 4| A| | 67| a| 0| A| | 70| b| 4| B| | 71| d| 4| A| +---+----+---+----------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With