I ran into something odd with left outer joins on multiple keys in BigQuery. If one of the keys is null
on right table it doesn't match and therefore it puts null
in all the values for the right table.
If I've got 2 tables
Table1 Table2
------------ ----------------
k1 |k2 |v3 k1 |k2 | v4
------------ ----------------
foo|boo |hey foo|NULL| you
bar|test|yo bar|test| hi
foo|NULL|hey
and I do the following join
SELECT t1.k1, t1.k2, t1.v3, t2.v4 FROM [Table1] t1
LEFT OUTER JOIN EACH [Table2] t2
ON t1.k1=t2.k1 AND t1.k2=t2.k2
I get a result like this
t1_k1|t1_k2|t1_v3|t2_v4
-----------------------
foo |boo |hey |NULL --No match here so NULL in t2_v4 it's ok
bar |test |hey |hi --It matches here on bar and test
foo |NULL |hey |NULL --It doesn't match on foo NULL.
I would expect that the last row was
foo |NULL |hey |you
Is that the expected behaviour? With this I mean is no matches on NULL
.
Is there any other way of producing the result I want?
In SQL, NULL
is not equal to anything--even itself.
Gordon's answer has some reasonable suggestions, but note that BigQuery only supports join conditions that are conjunctions of equalities, which rules out use of OR
or IS NULL
.
Can you use a sentinel value other than null? For example, if you replace NULL
with the empty string (or the string "null"
, or something else that doesn't occur elsewhere in your data), the join will work as you expect. You can even do this on the fly by using a subquery, at minimal performance cost.
SELECT t1.k1, t1.k2, t1.v3, t2.v4
FROM
(SELECT IFNULL(k1, "null") k1, IFNULL(k2, "null") k2 FROM [Table1]) t1
LEFT OUTER JOIN EACH
(SELECT IFNULL(v3, "null") v3, IFNULL(v4, "null") v4 FROM [Table2]) t2
ON t1.k1 = t2.k1 AND t1.k2 = t2.k2
You can add an outer select to turn the string "null"
back into a real NULL
.
Obviously this only works if the string "null"
doesn't occur elsewhere in your data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With