I can think of 2 methods to filter in Google BigQuery. (Say Table2 is small.) Which method of filtering would be preferable, and why? Does it depend on the size of table1? Or is there another, better method?
SELECT
a.ID
FROM
table1 AS a
LEFT OUTER JOIN
table2 AS b
ON
a.ID = b.ID
WHERE
b.ID is NULL
or
SELECT
ID
FROM
table1
WHERE
ID NOT IN (
SELECT
ID
FROM
table2)
A left anti join : This join returns rows in the left table that have no matching rows in the right table. A right anti join : This join returns rows in the right table that have no matching rows in the left table.
Use INT64 data types in joins to reduce cost and improve comparison performance. Best practice: If your use case supports it, use INT64 data types in joins instead of STRING data types. BigQuery does not index primary keys like traditional databases, so the wider the join column is, the longer the comparison takes.
In addition, Google BigQuery uses the default equals (=) operator to compare columns and does not support other operators. Select Inner Join to only include records in which the joined columns from both tables satisfy the join condition.
Example. Imagine we have two tables; Cats and Dogs , and we then run the following query: SELECT CatName AS PetName FROM Cats EXCEPT SELECT DogName AS PetName FROM Dogs; This query uses the EXCEPT operator, and it will appear as a LEFT ANTI SEMI JOIN in the execution plan.
Second is limited to one and only one field to be filtered by.
First can be used to filter by multiple fields, like below
SELECT
a.ID
FROM
table1 AS a
LEFT OUTER JOIN
table2 AS b
ON
a.ID = b.ID AND a.Field2 = b.Field2 AND a.Field3 = b.Field3
WHERE
b.ID is NULL
I tested a two fields anti-join and the difference was noticeable.
SELECT fields, ..
FROM inputA A
LEFT JOIN inputB B
ON A.key = B.key AND A.key2 = B.key2
WHERE B.key IS NULL AND B.key2 IS NULL
Performs better than its NOT IN equivalent:
SELECT fields, ..
FROM inputA A
WHERE key NOT IN (SELECT NUM_RUT FROM key)
AND key2 NOT IN (SELECT key2 FROM inputB)
in this case 5s vs 7s in BigQuery after reading 4.6Gb.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With