I need some help understanding joins in bigquery standard sql. I want to do a left join keeping all the columns in table1, and joining to table2 if 2 fields match OR a different 2 fields match. This should be better explained in my example.
Table1:
id1 id2 column1
1 a first
2 b second
3 c third
4 d fourth
5 e fifth
6 f sixth
Table 2:
id3 id4 column2
1 5674 alpha
2 4535 bravo
345 a charlie
341 b delta
I want to keep all the rows in table1 and all the rows from table2 if id1 = id3 OR id2 = id4, the resulting table would look like this:
Result:
id1 id2 column1 id3 id4 column2
1 a first 1 5674 alpha
1 a first 345 a charlie
2 b second 2 4535 bravo
2 b second 341 b delta
3 c third
4 d fourth
5 e fifth
6 f sixth
Although I can't get this result as it seems I can't do a left join with an OR statement in the ON clause.
I have tried this query:
SELECT * FROM table1
JOIN table2
on (table1.id1 = table2.id3) or (table1.id2 = table2.id4)
Which is an inner join and results in:
id1 id2 column1 id3 id4 column2
1 a first 1 5674 alpha
1 a first 345 a charlie
2 b second 2 4535 bravo
2 b second 341 b delta
Which is almost there but does not include the other rows from table1
Trying this query:
SELECT * FROM table1
LEFT JOIN table2
on (table1.id1 = table2.id3) or (table1.id2 = table2.id4)
Results in the error:
LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
I understand that there may be a workaround which I would love to hear, but also it would be good to understand why the same method for the inner join is not applicable to the left join.
Thanks in advance, Ryan
EDIT I am still struggling to get my head around why this is not possible, Mikhail's workaround is fine quite heavy on resources and I'd like to understand what is preventing me from using the conditional left join.
The WITH clause contains one or more common table expressions (CTEs). Each CTE binds the results of a subquery to a table name, which can be used elsewhere in the same query expression. BigQuery does not materialize the results of non-recursive CTEs within the WITH clause.
If you want to remove a specific character from your String then you can use the Trimming function to do so. Based on the position of the character that you wish to remove there are three kinds of BigQuery String Functions: TRIM (value1[, value2]): It removes all the leading and trailing characters that match value2.
Below is for BigQuery Standard SQL
one of the options:
#standardSQL
SELECT DISTINCT * FROM (
SELECT * FROM `project.dataset.table1`
LEFT JOIN `project.dataset.table2`
ON id1 = id3
UNION ALL
SELECT * FROM `project.dataset.table1`
LEFT JOIN `project.dataset.table2`
ON id2 = id4
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With