I have the following tables:
DataValue
DateStamp ItemId Value ---------- ------ ----- 2012-05-22 1 6541 2012-05-22 2 12321 2012-05-21 3 32
tmp_holding_DataValue
DateStamp ItemId Value ---------- ------ ----- 2012-05-22 1 6541 2012-05-22 4 87 2012-05-21 5 234
DateStamp
and ItemId
are the primary key columns.
I'm doing an insert which runs periodically throughout the day (in a stored procedure):
insert into DataValue(DateStamp, ItemId, Value) select DateStamp, ItemId, Value from tmp_holding_DataValue;
This moves data from the holding table (tmp_holding_DataValue
) across into the main data table (DataValue
). The holding table is then truncated.
The problem is that as in the example, the holding table could contain items which already exist in the main table. Since the key will not allow duplicate values the procedure will fail.
One option would be to put a where clause on the insert proc, but the main data table has 10 million+ rows, and this could take a long time.
Is there any other way to get the procedure to just skip-over/ignore the duplicates as it tries to insert?
Following is the syntax: select count(distinct yourColumnName) from yourTableName; In MySQL, COUNT() will display the number of rows. DISTINCT is used to ignore duplicate rows and get the count of only unique rows.
Use the INSERT IGNORE command rather than the INSERT command. If a record doesn't duplicate an existing record, then MySQL inserts it as usual. If the record is a duplicate, then the IGNORE keyword tells MySQL to discard it silently without generating an error.
There's no query with insert ... values. So this approach won't work with these. And what if the source contains duplicates? How do you skip the extra rows there? The simplest method is to add a hint to the query. Added in 11.2, the ignore_row_on_dupkey_index hint silently ignores duplicate values:
The simplest method is to add a hint to the query. Added in 11.2, the ignore_row_on_dupkey_index hint silently ignores duplicate values: insert /*+ ignore_row_on_dupkey_index ( acct ( username ) ) */ into accounts acct ( username, given_name ) select username, given_name from accounts_stage;
Since we have identified the duplicates/triplicates as the rows where RowNumber is greater than 1 above, all we need to do is delete such records. The TSQL code below has three parts. The first part is the derived table, a. It assigns a value to all the rows for the column RowNumber in the table Emp_Details.
In PostgreSQL, I have found a few ways to ignore duplicate inserts. Create a transaction that catches unique constraint violations, taking no action: BEGIN INSERT INTO db_table (tbl_column) VALUES (v_tbl_column); EXCEPTION WHEN unique_violation THEN -- Ignore duplicate inserts. END;
INSERT dbo.DataValue(DateStamp, ItemId, Value) SELECT DateStamp, ItemId, Value FROM dbo.tmp_holding_DataValue AS t WHERE NOT EXISTS (SELECT 1 FROM dbo.DataValue AS d WHERE DateStamp = t.DateStamp AND ItemId = t.ItemId);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With