Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use multiple conditions with pyspark.sql.functions.when()?

I have a dataframe with a few columns. Now I want to derive a new column from 2 other columns:

from pyspark.sql import functions as F new_df = df.withColumn("new_col", F.when(df["col-1"] > 0.0 & df["col-2"] > 0.0, 1).otherwise(0)) 

With this I only get an exception:

py4j.Py4JException: Method and([class java.lang.Double]) does not exist 

It works with just one condition like this:

new_df = df.withColumn("new_col", F.when(df["col-1"] > 0.0, 1).otherwise(0)) 

Does anyone know to use multiple conditions?

I'm using Spark 1.4.

like image 800
jho Avatar asked Oct 15 '15 14:10

jho


People also ask

How do you write multiple conditions in PySpark?

when in pyspark multiple conditions can be built using &(for and) and | (for or). Save this answer.

How do you use multiple filter conditions in PySpark?

PySpark Filter with Multiple Conditions In PySpark, to filter() rows on DataFrame based on multiple conditions, you case use either Column with a condition or SQL expression. Below is just a simple example using AND (&) condition, you can extend this with OR(|), and NOT(!) conditional expressions as needed.

How do you use when and otherwise in PySpark?

1. Using when() otherwise() on PySpark DataFrame. PySpark when() is SQL function, in order to use this first you should import and this returns a Column type, otherwise() is a function of Column , when otherwise() not used and none of the conditions met it assigns None (Null) value. Usage would be like when(condition).

How do you write if else condition in PySpark?

PySpark: withColumn() with two conditions and three outcomes df = df. withColumn('new_column', IF fruit1 == fruit2 THEN 1, ELSE 0. IF fruit1 IS NULL OR fruit2 IS NULL 3.)


2 Answers

Use parentheses to enforce the desired operator precedence:

F.when( (df["col-1"]>0.0) & (df["col-2"]>0.0), 1).otherwise(0) 
like image 182
Ashalynd Avatar answered Sep 20 '22 17:09

Ashalynd


when in pyspark multiple conditions can be built using &(for and) and | (for or), it is important to enclose every expressions within parenthesis that combine to form the condition

%pyspark dataDF = spark.createDataFrame([(66, "a", "4"),                                  (67, "a", "0"),                                  (70, "b", "4"),                                  (71, "d", "4")],                                 ("id", "code", "amt")) dataDF.withColumn("new_column",        when((col("code") == "a") | (col("code") == "d"), "A")       .when((col("code") == "b") & (col("amt") == "4"), "B")       .otherwise("A1")).show() 

when in spark scala can be used with && and || operator to build multiple conditions

//Scala val dataDF = Seq(           (66, "a", "4"), (67, "a", "0"), (70, "b", "4"), (71, "d", "4"           )).toDF("id", "code", "amt")     dataDF.withColumn("new_column",            when(col("code") === "a" || col("code") === "d", "A")           .when(col("code") === "b" && col("amt") === "4", "B")           .otherwise("A1"))           .show() 

Output:

+---+----+---+----------+ | id|code|amt|new_column| +---+----+---+----------+ | 66|   a|  4|         A| | 67|   a|  0|         A| | 70|   b|  4|         B| | 71|   d|  4|         A| +---+----+---+----------+ 
like image 45
vj sreenivasan Avatar answered Sep 20 '22 17:09

vj sreenivasan