Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive outer join: how to change the default NULL value

Tags:

join

hadoop

hive

For hive outer join, if a joining key does not exist in one table,hive will put NULL. Is that possible to use another value for this? For example:

Table1:

user_id, name, age
1        Bob   23
2        Jim   43

Table2:

user_id,   txn_amt,      date
 1          20.00      2013-12-10
 1          10.00      2014-07-01

If I do a LEFT OUTER JOIN on user_id:

INSERT INTO TABLE user_txn
SELECT 
     Table1.user_id,
     Table1.name,
     Table2.txn_amt,
     Table2.date
FROM
    Table2
LEFT OUTER JOIN
Table1
ON
Table1.user_id = Table2.user_id;

I want the output be like this:

user_id,    name,   tnx_amt,   date
1           Bob     20.00      2013-12-10
1           Bob     10.00      2014-07-01
2           Jim     0.00       2099-12-31

Note the txn_amt and date columns for Jim. Is there any way in hive to define default values like this?

like image 590
Rainfield Avatar asked Jul 08 '14 16:07

Rainfield


1 Answers

You can use COALESCE for this, instead of solely Table2.txn_amt

COALESCE(Table2.txn_amt, 0.0)

What this does is returns the first value that is not null. So, if txn_amt is null, it'll go to the second value in the list. 0.0 is never null, so it'll pick that. If txn_amt has a value in it, it'll return that value.

like image 133
Donald Miner Avatar answered Oct 09 '22 19:10

Donald Miner