Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache spark case with multiple when clauses on different columns

Given the below structure:

val df = Seq("Color", "Shape", "Range","Size").map(Tuple1.apply).toDF("color")

val df1 = df.withColumn("Success", when($"color"<=> "white", "Diamond").otherwise(0))

I want to write one more WHEN condition at above where size > 10 and Shape column value is Rhombus then "Diamond" value should be inserted to the column else 0. I tried like below but it's failing

val df1 = df.withColumn("Success", when($"color" <=> "white", "Diamond").otherwise(0)).when($"size">10)

Please suggest me with only dataframe option with scala. Spark-SQL with sqlContext is not helpful idea for me.

Thanks !

like image 487
Bandi LokeshReddy Avatar asked Feb 20 '17 16:02

Bandi LokeshReddy


1 Answers

You can chain the when similar to the example in https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Column.html#when-org.apache.spark.sql.Column-java.lang.Object- available since (1.4.0)

// Scala:
people.select(when(people("gender") === "male", 0)
 .when(people("gender") === "female", 1)
 .otherwise(2))

Your example:

val df1 = df.withColumn("Success",
  when($"color" <=> "white", "Diamond")
  .when($"size" > 10 && $"shape" === "Rhombus", "Diamond")
  .otherwise(0))
like image 171
jgaw Avatar answered Sep 22 '22 12:09

jgaw