Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Left outer join in BigQuery on multiple keys doesn't if one of them is null

I ran into something odd with left outer joins on multiple keys in BigQuery. If one of the keys is null on right table it doesn't match and therefore it puts null in all the values for the right table.

If I've got 2 tables

   Table1              Table2
------------      ----------------
 k1 |k2  |v3      k1 |k2  | v4
------------      ----------------
 foo|boo |hey     foo|NULL| you
 bar|test|yo      bar|test| hi
 foo|NULL|hey

and I do the following join

SELECT t1.k1, t1.k2, t1.v3, t2.v4 FROM [Table1] t1
LEFT OUTER JOIN EACH [Table2] t2 
ON t1.k1=t2.k1 AND t1.k2=t2.k2

I get a result like this

t1_k1|t1_k2|t1_v3|t2_v4
-----------------------
foo  |boo  |hey  |NULL  --No match here so NULL in t2_v4 it's ok
bar  |test |hey  |hi    --It matches here on bar and test
foo  |NULL |hey  |NULL  --It doesn't match on foo NULL.

I would expect that the last row was

foo  |NULL |hey  |you

Is that the expected behaviour? With this I mean is no matches on NULL.

Is there any other way of producing the result I want?

like image 881
pablorc Avatar asked Jul 10 '15 02:07

pablorc


Video Answer


1 Answers

In SQL, NULL is not equal to anything--even itself.

Gordon's answer has some reasonable suggestions, but note that BigQuery only supports join conditions that are conjunctions of equalities, which rules out use of OR or IS NULL.

Can you use a sentinel value other than null? For example, if you replace NULL with the empty string (or the string "null", or something else that doesn't occur elsewhere in your data), the join will work as you expect. You can even do this on the fly by using a subquery, at minimal performance cost.

SELECT t1.k1, t1.k2, t1.v3, t2.v4
FROM
  (SELECT IFNULL(k1, "null") k1, IFNULL(k2, "null") k2 FROM [Table1]) t1
LEFT OUTER JOIN EACH
  (SELECT IFNULL(v3, "null") v3, IFNULL(v4, "null") v4 FROM [Table2]) t2
ON t1.k1 = t2.k1 AND t1.k2 = t2.k2

You can add an outer select to turn the string "null" back into a real NULL.

Obviously this only works if the string "null" doesn't occur elsewhere in your data.

like image 110
Jeremy Condit Avatar answered Sep 18 '22 09:09

Jeremy Condit