Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Scala handle isnull or ifnull in query with sqlContext

I have two data files as below:

course.txt 
id,course 
1,Hadoop
2,Spark
3,HBase
5,Impala

Fee.txt 
id,amount 
2,3900
3,4200
4,2900

I need to list all course info with their fee:

sqlContext.sql("select c.id, c.course, f.amount from course c left outer join fee f on f.id = c.id").show
+---+------+------+
| id|course|amount|
+---+------+------+
|  1|Hadoop|  null|
|  2| Spark|3900.0|
|  3| HBase|4200.0|
|  5|Impala|  null|
+---+------+------+

if the course is not indicated in the Fee table, then instead of showing null, I want to show 'N/A'.

I've tried the following and not getting it yet:

command 1:

sqlContext.sql("select c.id, c.course, ifnull(f.amount, 'N/A') from course c left outer join fee f on f.id = c.id").show

Error: org.apache.spark.sql.AnalysisException: undefined function ifnull; line 1 pos 40

command 2:

sqlContext.sql("select c.id, c.course, isnull(f.amount, 'N/A') from course c left outer join fee f on f.id = c.id").show

Error: org.apache.spark.sql.AnalysisException: No handler for Hive udf class org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNull because: The operator 'IS NULL' only accepts 1 argument..; line 1 pos 40

What is the right way to handle this in sqlContext within Scala? Thank you very much.

like image 1000
Choix Avatar asked Mar 25 '26 02:03

Choix


2 Answers

Using Spark DataFrame API, you can use when/otherwise with isNull condition:

val course = Seq(
  (1, "Hadoop"),
  (2, "Spark"),
  (3, "HBase"),
  (5, "Impala")
).toDF("id", "course")

val fee = Seq(
  (2, 3900),
  (3, 4200),
  (4, 2900)
).toDF("id", "amount")

course.join(fee, Seq("id"), "left_outer").
  withColumn("amount", when($"amount".isNull, "N/A").otherwise($"amount")).
  show
// +---+------+------+
// | id|course|amount|
// +---+------+------+
// |  1|Hadoop|   N/A|
// |  2| Spark|  3900|
// |  3| HBase|  4200|
// |  5|Impala|   N/A|
// +---+------+------+

If you prefer using Spark SQL, here's an equivalent SQL:

course.createOrReplaceTempView("coursetable")
fee.createOrReplaceTempView("feetable")

val result = spark.sql("""
  select
    c.id, c.course,
    case when f.amount is null then 'N/A' else f.amount end as amount
  from
    coursetable c left outer join feetable f on f.id = c.id
""")
like image 73
Leo C Avatar answered Mar 27 '26 16:03

Leo C


If it is spark SQL , use coalesce UDF

select 
  c.id, 
  c.course, 
  coalesce(f.amount, 'N/A') as amount 
from c 
left outer join f 
on f.id = c.id"
like image 38
maxmithun Avatar answered Mar 27 '26 14:03

maxmithun