Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skip-over/ignore duplicate rows on insert

I have the following tables:

DataValue

DateStamp    ItemId   Value ----------   ------   ----- 2012-05-22   1        6541 2012-05-22   2        12321 2012-05-21   3        32 

tmp_holding_DataValue

DateStamp    ItemId   Value ----------   ------   ----- 2012-05-22   1        6541 2012-05-22   4        87 2012-05-21   5        234 

DateStamp and ItemId are the primary key columns.

I'm doing an insert which runs periodically throughout the day (in a stored procedure):

insert into DataValue(DateStamp, ItemId, Value) select DateStamp, ItemId, Value from tmp_holding_DataValue; 

This moves data from the holding table (tmp_holding_DataValue) across into the main data table (DataValue). The holding table is then truncated.

The problem is that as in the example, the holding table could contain items which already exist in the main table. Since the key will not allow duplicate values the procedure will fail.

One option would be to put a where clause on the insert proc, but the main data table has 10 million+ rows, and this could take a long time.

Is there any other way to get the procedure to just skip-over/ignore the duplicates as it tries to insert?

like image 835
finoutlook Avatar asked May 22 '12 14:05

finoutlook


People also ask

How do I ignore duplicate rows in SQL?

Following is the syntax: select count(distinct yourColumnName) from yourTableName; In MySQL, COUNT() will display the number of rows. DISTINCT is used to ignore duplicate rows and get the count of only unique rows.

How do I ignore duplicate entries?

Use the INSERT IGNORE command rather than the INSERT command. If a record doesn't duplicate an existing record, then MySQL inserts it as usual. If the record is a duplicate, then the IGNORE keyword tells MySQL to discard it silently without generating an error.

How to skip the extra rows in a query with duplicates?

There's no query with insert ... values. So this approach won't work with these. And what if the source contains duplicates? How do you skip the extra rows there? The simplest method is to add a hint to the query. Added in 11.2, the ignore_row_on_dupkey_index hint silently ignores duplicate values:

How do I ignore duplicate values in a SQL query?

The simplest method is to add a hint to the query. Added in 11.2, the ignore_row_on_dupkey_index hint silently ignores duplicate values: insert /*+ ignore_row_on_dupkey_index ( acct ( username ) ) */ into accounts acct ( username, given_name ) select username, given_name from accounts_stage;

How to delete duplicates if rownumber is greater than 1 above?

Since we have identified the duplicates/triplicates as the rows where RowNumber is greater than 1 above, all we need to do is delete such records. The TSQL code below has three parts. The first part is the derived table, a. It assigns a value to all the rows for the column RowNumber in the table Emp_Details.

How do I ignore duplicate inserts in PostgreSQL?

In PostgreSQL, I have found a few ways to ignore duplicate inserts. Create a transaction that catches unique constraint violations, taking no action: BEGIN INSERT INTO db_table (tbl_column) VALUES (v_tbl_column); EXCEPTION WHEN unique_violation THEN -- Ignore duplicate inserts. END;


1 Answers

INSERT dbo.DataValue(DateStamp, ItemId, Value) SELECT DateStamp, ItemId, Value  FROM dbo.tmp_holding_DataValue AS t WHERE NOT EXISTS (SELECT 1 FROM dbo.DataValue AS d WHERE DateStamp = t.DateStamp AND ItemId = t.ItemId); 
like image 97
Aaron Bertrand Avatar answered Sep 20 '22 21:09

Aaron Bertrand