Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery Merge - Insert new rows if not mached

I'm trying to write a Merge query in Google BigQuery (part of an ETL process).

I have Source (staging) and Target tables and I have 2 ways of merge the data: the classic 'Upsert' Merge OR Insert new row if not matched all columns.

This is an example of the first way (the classic 'Upsert') query:

MERGE DS.Target T
USING DS.Source S
ON T.Key=S.Key    
WHEN NOT MATCHED THEN
  INSERT ROW
WHEN MATCHED THEN
  UPDATE SET Col1 = S.Col1, Col2 = S.Col2

in that way if the key exist it always updates the values of the cols even if value are the same. also this will work only if the key is not Nullable.

The other way of doing it is to inserting new row when values not matched:

MERGE DS.Target T
USING DS.Source S
ON T.A = S.A and T.B = S.B and T.C = S.C
WHEN NOT MATCHED THEN
INSERT ROW

I prefer this way, BUT I found that its not possible when column type is NULL, because NULL != NULL and then the condition is false when values are Null.

I can't find a proper way of writing this query and handle Nulls comparison.

It's not possible to check for Nulls at the merge condition, Ex:

ON ((T.A IS NULL and S.A IS NULL) or T.A = S.A)
WHEN NOT MATCHED THEN
INSERT ROW

Error message:

RIGHT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

It's not possible also to use the Target table reference at the WHERE clause, Ex:

ON T.A = S.A
WHEN NOT MATCHED AND 
   S.A IS NOT NULL AND T.A IS NOT NULL
THEN
   INSERT ROW

What do you suggest? Also, lets say both ways are possible, what would be more cost effective by BQ? I guess the performance should be the same. I also assume that I can ignore the insertions cost. Thanks!

like image 973
Guy P Avatar asked Nov 27 '19 06:11

Guy P


People also ask

Does BigQuery maintain order of insertion?

When you append data, we will append to the end of the table data list, however, bigquery may periodically coalesce data, which does not respect ordering.

What is Upsert in BigQuery?

As mentioned earlier, BigQuery UPSERT means UPDATE+INSERT. It is essentially INSERT that behaves like an UPDATE in case of duplicates, so as to not violate any uniqueness constraints.

Can you update rows in BigQuery?

The BigQuery data manipulation language (DML) enables you to update, insert, and delete data from your BigQuery tables. You can execute DML statements just as you would a SELECT statement, with the following conditions: You must use Google Standard SQL.

What does except distinct do in BigQuery?

The table operator except [distinct] returns the rows of the first result except those that are also in the second.


1 Answers

Can you use a "magic" number or id?

This works:

CREATE OR REPLACE TABLE temp.target AS
SELECT * FROM UNNEST(
  [STRUCT(1 AS A, 2 AS B, 3 AS C, 5 AS d)
   , (null, 1, 3, 500)
  ]);
CREATE OR REPLACE TABLE temp.source AS
SELECT * FROM UNNEST(
  [STRUCT(1 AS A, 2 AS B, 3 AS C, 100 AS d)
   , (1, 1, 1, 1000)
   , (null, null, null, 10000)
   , (null, 1, 3, 10000)
  ]);

enter image description here

enter image description here

MERGE temp.target T
USING temp.source S
ON IFNULL(T.A, -9999999) = IFNULL(S.A, -9999999) and IFNULL(T.B, -9999999) = IFNULL(S.B, -9999999) and IFNULL(T.C, -9999999) = IFNULL(S.C, -9999999)
WHEN NOT MATCHED THEN
INSERT ROW;

enter image description here

like image 139
Felipe Hoffa Avatar answered Sep 20 '22 22:09

Felipe Hoffa