Logo Questions Linux Laravel Mysql Ubuntu Git Menu

SPARK SQL - case when then

I'm new to SPARK-SQL. Is there an equivalent to "CASE WHEN 'CONDITION' THEN 0 ELSE 1 END" in SPARK SQL ?

select case when 1=1 then 1 else 0 end from table

Thanks Sridhar

like image 505
user3279189 Avatar asked Aug 06 '14 10:08


People also ask

How do you use when and otherwise in PySpark?

Implementing when() and otherwise() in PySpark in Databricks. PySpark When Otherwise – The when() is a SQL function that returns a Column type, and otherwise() is a Column function. If otherwise() is not used, it returns the None/NULL value.

How do you write if else condition in PySpark?

PySpark When Otherwise – when() is a SQL function that returns a Column type and otherwise() is a function of Column, if otherwise() is not used, it returns a None/NULL value. PySpark SQL Case When – This is similar to SQL expression, Usage: CASE WHEN cond1 THEN result WHEN cond2 THEN result... ELSE result END .

4 Answers

Before Spark 1.2.0

The supported syntax (which I just tried out on Spark 1.0.2) seems to be

SELECT IF(1=1, 1, 0) FROM table

This recent thread http://apache-spark-user-list.1001560.n3.nabble.com/Supported-SQL-syntax-in-Spark-SQL-td9538.html links to the SQL parser source, which may or may not help depending on your comfort with Scala. At the very least the list of keywords starting (at time of writing) on line 70 should help.

Here's the direct link to the source for convenience: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala.

Update for Spark 1.2.0 and beyond

As of Spark 1.2.0, the more traditional syntax is supported, in response to SPARK-3813: search for "CASE WHEN" in the test source. For example:


Update for most recent place to figure out syntax from the SQL Parser

The parser source can now be found here.

Update for more complex examples

In response to a question below, the modern syntax supports complex Boolean conditions.

    CASE WHEN id = 1 OR id = 2 THEN "OneOrTwo" ELSE "NotOneOrTwo" END AS IdRedux
FROM customer

You can involve multiple columns in the condition.

    CASE WHEN id = 1 OR state = 'MA' 
         THEN "OneOrMA" 
         ELSE "NotOneOrMA" END AS IdRedux
FROM customer

You can also nest CASE WHEN THEN expression.

    CASE WHEN id = 1 
         THEN "OneOrMA"
             CASE WHEN state = 'MA' THEN "OneOrMA" ELSE "NotOneOrMA" END
    END AS IdRedux
FROM customer
like image 177
Spiro Michaylov Avatar answered Oct 16 '22 12:10

Spiro Michaylov

For Spark 2.+ Spark when function

From documentation:

Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.

 // Example: encoding gender string column into integer.

   // Scala:
   people.select(when(col("gender") === "male", 0)
     .when(col("gender") === "female", 1)

   // Java:
   people.select(when(col("gender").equalTo("male"), 0)
     .when(col("gender").equalTo("female"), 1)
like image 40
Ehud Lev Avatar answered Oct 16 '22 11:10

Ehud Lev

This syntax worked for me in Databricks:

      when (age is null) then 'Not Available'
      when (age < 15) then 'Less than 15'
      when (age >= 15 and age < 25) then '15 to 25'
      when (age >= 25 and age < 35) then '25 to 35'
      when (age >= 35 and age < 45) then '35 to 45'
      when (age >= 45) then '45 and Older'
    end as age_range
  from demo
like image 3
John Avatar answered Oct 16 '22 11:10


The decode() function analog of Oracle SQL for SQL Spark can be implemented as follows:

​ case
​ ​ ​ when exp1 in ('a','b','c')
​ ​ ​ ​ then element_at(map('a','A','b','B','c','C'), exp1)
​ ​ ​ else exp1
​ ​ end
like image 1
mr.polden2010 Avatar answered Oct 16 '22 10:10
