Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CASE statements in Hive

Ok, i have a following code to mark records that have highest month_cd in tabl with binary flag:

Select t1.month_cd, t2.max_month_cd
  ,CASE WHEN t2.max_month_cd != null then 0 else 1 end test_1
  ,CASE WHEN t2.max_month_cd = null then 0 else 1 end test_2
from source t1
Left join (
  Select 
    MAX(month_cd) as max_month_cd 
  From source 
) t2 
on t1.month_cd = t2.max_month_cd;

It seems straight forward to me, but result it return is:

month_cd  max_month_cd  test_1  test_2
201610    null          1       1
201611    201611        1       1

Makes zero sense to me, and seems to be way too obvious to be a bug in execution engine. What am i missing?

like image 600
JagdCrab Avatar asked Dec 07 '16 17:12

JagdCrab


1 Answers

This is all about NULL concept.

Since Null is not a member of any data domain, it is not considered a "value", but rather a marker indicating the absence of value. Because of this, comparisons with Null can never result in either True or False, but always in a third logical result, Unknown. NULL is nothing, absence of object. So, nothing can NOT be equal to NULL or something else. In SQL there are IS NULL and IS NOT NULL conditions to be used for test for null values.

In your CASE the result of logical expression is unknown, so ELSE value is assigned.

Corrected version:

CASE WHEN t2.max_month_cd IS NOT null then 0 else 1 end test_1,
CASE WHEN t2.max_month_cd IS null     then 0 else 1 end test_2

See this for reference: Null (SQL)

like image 80
leftjoin Avatar answered Sep 23 '22 09:09

leftjoin