Apache spark dealing with case statements

Tags:

I am dealing with transforming SQL code to PySpark code and came across some SQL statements. I don't know how to approach case statments in pyspark? I am planning on creating a RDD and then using rdd.map and then do some logic checks. Is that the right approach? Please help!

Basically I need to go through each line in the RDD or DF and based on some logic I need to edit one of the column values.

     case                  when (e."a" Like 'a%' Or e."b" Like 'b%')                  And e."aa"='BW' And cast(e."abc" as decimal(10,4))=75.0 Then 'callitA'                 when (e."a" Like 'b%' Or e."b" Like 'a%')                  And e."aa"='AW' And cast(e."abc" as decimal(10,4))=75.0 Then 'callitB'  else  'CallitC'

472

asked Oct 11 '16 16:10

Amar Singh

1 Answers

These are few ways to write If-Else / When-Then-Else / When-Otherwise expression in pyspark.

Sample dataframe

df = spark.createDataFrame([(1,1),(2,2),(3,3)],['id','value'])  df.show()  #+---+-----+ #| id|value| #+---+-----+ #|  1|    1| #|  2|    2| #|  3|    3| #+---+-----+  #Desired Output: #+---+-----+----------+ #| id|value|value_desc| #+---+-----+----------+ #|  1|    1|       one| #|  2|    2|       two| #|  3|    3|     other| #+---+-----+----------+

Option#1: withColumn() using when-otherwise

from pyspark.sql.functions import when  df.withColumn("value_desc",when(df.value == 1, 'one').when(df.value == 2, 'two').otherwise('other')).show()

Option#2: select() using when-otherwise

from pyspark.sql.functions import when  df.select("*",when(df.value == 1, 'one').when(df.value == 2, 'two').otherwise('other').alias('value_desc')).show()

Option3: selectExpr() using SQL equivalent CASE expression

df.selectExpr("*","CASE WHEN value == 1 THEN  'one' WHEN value == 2 THEN  'two' ELSE 'other' END AS value_desc").show()

SQL like expression can also be written in withColumn() and select() using pyspark.sql.functions.expr function. Here are examples.

Option4: select() using expr function

from pyspark.sql.functions import expr   df.select("*",expr("CASE WHEN value == 1 THEN  'one' WHEN value == 2 THEN  'two' ELSE 'other' END AS value_desc")).show()

Option5: withColumn() using expr function

from pyspark.sql.functions import expr   df.withColumn("value_desc",expr("CASE WHEN value == 1 THEN  'one' WHEN value == 2 THEN  'two' ELSE 'other' END AS value_desc")).show()

Output:

#+---+-----+----------+ #| id|value|value_desc| #+---+-----+----------+ #|  1|    1|       one| #|  2|    2|       two| #|  3|    3|     other| #+---+-----+----------+

130

answered Oct 04 '22 02:10

Shantanu Sharma

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apache spark dealing with case statements

Tags:

Amar Singh

People also ask

1 Answers

Shantanu Sharma

Recent Activity

Donate For Us

Apache spark dealing with case statements

Tags:

Amar Singh

People also ask

1 Answers

Shantanu Sharma

Related questions

Recent Activity

Donate For Us