BigQuery Merge - Insert new rows if not mached

Tags:

google-bigquery

I'm trying to write a Merge query in Google BigQuery (part of an ETL process).

I have Source (staging) and Target tables and I have 2 ways of merge the data: the classic 'Upsert' Merge OR Insert new row if not matched all columns.

This is an example of the first way (the classic 'Upsert') query:

MERGE DS.Target T
USING DS.Source S
ON T.Key=S.Key    
WHEN NOT MATCHED THEN
  INSERT ROW
WHEN MATCHED THEN
  UPDATE SET Col1 = S.Col1, Col2 = S.Col2

in that way if the key exist it always updates the values of the cols even if value are the same. also this will work only if the key is not Nullable.

The other way of doing it is to inserting new row when values not matched:

MERGE DS.Target T
USING DS.Source S
ON T.A = S.A and T.B = S.B and T.C = S.C
WHEN NOT MATCHED THEN
INSERT ROW

I prefer this way, BUT I found that its not possible when column type is NULL, because NULL != NULL and then the condition is false when values are Null.

I can't find a proper way of writing this query and handle Nulls comparison.

It's not possible to check for Nulls at the merge condition, Ex:

ON ((T.A IS NULL and S.A IS NULL) or T.A = S.A)
WHEN NOT MATCHED THEN
INSERT ROW

Error message:

RIGHT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

It's not possible also to use the Target table reference at the WHERE clause, Ex:

ON T.A = S.A
WHEN NOT MATCHED AND 
   S.A IS NOT NULL AND T.A IS NOT NULL
THEN
   INSERT ROW

What do you suggest? Also, lets say both ways are possible, what would be more cost effective by BQ? I guess the performance should be the same. I also assume that I can ignore the insertions cost. Thanks!

973

asked Nov 27 '19 06:11

Guy P

1 Answers

Can you use a "magic" number or id?

This works:

CREATE OR REPLACE TABLE temp.target AS
SELECT * FROM UNNEST(
  [STRUCT(1 AS A, 2 AS B, 3 AS C, 5 AS d)
   , (null, 1, 3, 500)
  ]);
CREATE OR REPLACE TABLE temp.source AS
SELECT * FROM UNNEST(
  [STRUCT(1 AS A, 2 AS B, 3 AS C, 100 AS d)
   , (1, 1, 1, 1000)
   , (null, null, null, 10000)
   , (null, 1, 3, 10000)
  ]);

enter image description here

MERGE temp.target T
USING temp.source S
ON IFNULL(T.A, -9999999) = IFNULL(S.A, -9999999) and IFNULL(T.B, -9999999) = IFNULL(S.B, -9999999) and IFNULL(T.C, -9999999) = IFNULL(S.C, -9999999)
WHEN NOT MATCHED THEN
INSERT ROW;

enter image description here

139

answered Sep 20 '22 22:09

Felipe Hoffa

Related questions
                            
                                Difficult grouped report from database
                            
                                SQL - replacing placeholder with parameter values
                            
                                Zip/repeat join?
                            
                                Oracle - grouping values along interval
                            
                                SQL Server convert xml to csv using CTE - nulls currently being ignored
                            
                                row_overflow data was not converting to in_row data after altering or updating the table
                            
                                DayPilot SQL - Copying appointments that dont have a Date
                            
                                Retrieving Data of Stored Procedure Using ODBC and PHP
                            
                                Right join without intersection and another condition
                            
                                Compare 2 large sql files and find differences to recover data
                            
                                Creating indexes on columns where primary key or unique key constraints already present
                            
                                Convert bytea to double precision in PostgreSQL
                            
                                Microsoft Visual Studio 2017 Mac: Where is server explorer?
                            
                                Can I retrieve paged rows and total count within a single query?
                            
                                SQL - How to select rows which have the same multiple values
                            
                                BigQuery - cannot UNION String and NULL
                            
                                Hierarchical JSON Data to Hierarchical Table
                            
                                SQL subtracting results of 2 select statement
                            
                                How to express this with jOOQ "Select alias.*, otherAlias.Column From .."
                            
                                Row level security for groups or Making rows accebile to groups

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With