I'm trying to write a Merge query in Google BigQuery (part of an ETL process).
I have Source (staging) and Target tables and I have 2 ways of merge the data: the classic 'Upsert' Merge OR Insert new row if not matched all columns.
This is an example of the first way (the classic 'Upsert') query:
MERGE DS.Target T
USING DS.Source S
ON T.Key=S.Key
WHEN NOT MATCHED THEN
INSERT ROW
WHEN MATCHED THEN
UPDATE SET Col1 = S.Col1, Col2 = S.Col2
in that way if the key exist it always updates the values of the cols even if value are the same. also this will work only if the key is not Nullable.
The other way of doing it is to inserting new row when values not matched:
MERGE DS.Target T
USING DS.Source S
ON T.A = S.A and T.B = S.B and T.C = S.C
WHEN NOT MATCHED THEN
INSERT ROW
I prefer this way, BUT I found that its not possible when column type is NULL, because NULL != NULL and then the condition is false when values are Null.
I can't find a proper way of writing this query and handle Nulls comparison.
It's not possible to check for Nulls at the merge condition, Ex:
ON ((T.A IS NULL and S.A IS NULL) or T.A = S.A)
WHEN NOT MATCHED THEN
INSERT ROW
Error message:
RIGHT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
It's not possible also to use the Target table reference at the WHERE
clause, Ex:
ON T.A = S.A
WHEN NOT MATCHED AND
S.A IS NOT NULL AND T.A IS NOT NULL
THEN
INSERT ROW
What do you suggest? Also, lets say both ways are possible, what would be more cost effective by BQ? I guess the performance should be the same. I also assume that I can ignore the insertions cost. Thanks!
When you append data, we will append to the end of the table data list, however, bigquery may periodically coalesce data, which does not respect ordering.
As mentioned earlier, BigQuery UPSERT means UPDATE+INSERT. It is essentially INSERT that behaves like an UPDATE in case of duplicates, so as to not violate any uniqueness constraints.
The BigQuery data manipulation language (DML) enables you to update, insert, and delete data from your BigQuery tables. You can execute DML statements just as you would a SELECT statement, with the following conditions: You must use Google Standard SQL.
The table operator except [distinct] returns the rows of the first result except those that are also in the second.
Can you use a "magic" number or id?
This works:
CREATE OR REPLACE TABLE temp.target AS
SELECT * FROM UNNEST(
[STRUCT(1 AS A, 2 AS B, 3 AS C, 5 AS d)
, (null, 1, 3, 500)
]);
CREATE OR REPLACE TABLE temp.source AS
SELECT * FROM UNNEST(
[STRUCT(1 AS A, 2 AS B, 3 AS C, 100 AS d)
, (1, 1, 1, 1000)
, (null, null, null, 10000)
, (null, 1, 3, 10000)
]);
MERGE temp.target T
USING temp.source S
ON IFNULL(T.A, -9999999) = IFNULL(S.A, -9999999) and IFNULL(T.B, -9999999) = IFNULL(S.B, -9999999) and IFNULL(T.C, -9999999) = IFNULL(S.C, -9999999)
WHEN NOT MATCHED THEN
INSERT ROW;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With