How to avoid inserting duplicate records when using a T-SQL Merge statement

Tags:

I am attempting to insert many records using T-SQL's MERGE statement, but my query fails to INSERT when there are duplicate records in the source table. The failure is caused by:

The target table has a Primary Key based on two columns
The source table may contain duplicate records that violate the target table's Primary Key constraint ("Violation of PRIMARY KEY constraint" is thrown)

I'm looking for a way to change my MERGE statement so that it either ignores duplicate records within the source table and/or will try/catch the INSERT statement to catch exceptions that may occur (i.e. all other INSERT statements will run regardless of the few bad eggs that may occur) - or, maybe, there's a better way to go about this problem?

Here's a query example of what I'm trying to explain. The example below will add 100k records to a temp table and then will attempt to insert those records in the target table -

EDIT In my original post I only included two fields in the example tables which gave way to SO friends to give a DISTINCT solution to avoid duplicates in the MERGE statement. I should have mentioned that in my real-world problem the tables have 15 fields and of those 15, two of the fields are a CLUSTERED PRIMARY KEY. So the DISTINCT keyword doesn't work because I need to SELECT all 15 fields and ignore duplicates based on two of the fields.

I have updated the query below to include one more field, col4. I need to include col4 in the MERGE, but I only need to make sure that ONLY col2 and col3 are unique.

-- Create the source table
CREATE TABLE #tmp (
col2 datetime NOT NULL,
col3 int NOT NULL,
col4 int
)
GO

-- Add a bunch of test data to the source table
-- For testing purposes, allow duplicate records to be added to this table
DECLARE @loopCount int = 100000
DECLARE @loopCounter int = 0
DECLARE @randDateOffset int
DECLARE @col2 datetime
DECLARE @col3 int
DECLARE @col4 int

WHILE (@loopCounter) < @loopCount
BEGIN
    SET @randDateOffset = RAND() * 100000
    SET @col2 = DATEADD(MI,@randDateOffset,GETDATE())
    SET @col3 = RAND() * 1000
    SET @col4 = RAND() * 10
    INSERT INTO #tmp
    (col2,col3,col4)
    VALUES
    (@col2,@col3,@col4);

    SET @loopCounter = @loopCounter + 1
END

-- Insert the source data into the target table
-- How do we make sure we don't attempt to INSERT a duplicate record? Or how can we 
-- catch exceptions? Or?
MERGE INTO dbo.tbl1 AS tbl
    USING (SELECT * FROM #tmp) AS src
    ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3)
    WHEN NOT MATCHED THEN 
        INSERT (col2,col3,col4)
        VALUES (src.col2,src.col3,src.col4);
GO

897

asked Jul 06 '11 06:07

Jed

2 Answers

Solved to your new specification. Only inserting the highest value of col4: This time I used a group by to prevent duplicate rows.

MERGE INTO dbo.tbl1 AS tbl 
USING (SELECT col2,col3, max(col4) col4 FROM #tmp group by col2,col3) AS src 
ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3) 
WHEN NOT MATCHED THEN  
    INSERT (col2,col3,col4) 
    VALUES (src.col2,src.col3,src.col4);

answered Sep 23 '22 07:09

t-clausen.dk

Given the source has duplicates and you aren't using MERGE fully, I'd use an INSERT.

 INSERT dbo.tbl1 (col2,col3) 
 SELECT DISTINCT col2,col3
 FROM #tmp src
 WHERE NOT EXISTS (
       SELECT *
       FROM dbo.tbl1 tbl
       WHERE tbl.col2 = src.col2 AND tbl.col3 = src.col3)

The reason MERGE fails is that it isn't checked row by row. All non-matches are found, then it tries to INSERT all these. It doesn't check for rows in the same batch that already match.

This reminds me a bit of the "Halloween problem" where early data changes of an atomic operation affect later data changes: it isn't correct

answered Sep 22 '22 07:09

gbn

Related questions
                            
                                alter table then update in single statement
                            
                                Get total row count while paging
                            
                                Why does not Hibernate set @DynamicInsert by default
                            
                                Second SELECT query if first SELECT returns 0 rows
                            
                                SQL: Add column with incremental id to SELECT
                            
                                MYSQL query performs very slow
                            
                                How to verify SqlAlchemy engine object
                            
                                Exclamation Marks in a Query SQL
                            
                                How to place unique contraint on multiple column
                            
                                Sql Server union with If condition
                            
                                MySQL List All Duplicates [duplicate]
                            
                                database restore failing with move
                            
                                java.sql.SQLException: Field 'supplier_id' doesn't have a default value
                            
                                Check constraint on existing column with PostgresQL
                            
                                Tricks for generating SQL statements in Excel
                            
                                Compare performance difference of T-SQL Between and '<' '>' operator?
                            
                                What is Service Broker in SQL Server?
                            
                                Is there a way to give a subquery an alias in Oracle 11g SQL?
                            
                                How do I cast a type to a bigint in MySQL?
                            
                                How costly are JOINs in SQL? And/or, what's the trade off between performance and normalization?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to avoid inserting duplicate records when using a T-SQL Merge statement

Tags:

merge

sql

tsql

Jed

People also ask

2 Answers

t-clausen.dk

gbn

Recent Activity

Donate For Us