Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ignore duplicate keys when extracting data using OPENQUERY while joining two tables?

I am trying to insert records into MySQL database from a MS SQL Server using the "OPENQUERY" but what I am trying to do is ignore the duplicate keys messages. so when the query run into a duplicate then ignore it and keep going.

What ideas can I do to ignore the duplicates?

Here is what I am doing:

  1. pulling records from MySQL using "OpenQuery" to define MySQL "A.record_id"
  2. Joining those records to records in MS SQL Server "with a specific criteria and not direct id" from here I find a new related "B.new_id" record identifier in SQL Server.
  3. I want to insert the found results into a new table in MySQL like so A.record_id, B.new_id Here in the new table I have A.record_id set as a primary key for that table.

The problem is that when joining table A to Table B some times I find 2+ records into table B matching the criteria that I am looking for which causes the value A.record_id to 2+ times in my data set before inserting that into table A which causes the problem. Note I can use aggregate function to eliminate the records.

like image 876
Jaylen Avatar asked Jan 19 '14 18:01

Jaylen


2 Answers

I don't think there is a specific option. But it is easy enough to do:

insert into oldtable(. . .)
    select . . .
    from newtable
    where not exists (select 1 from oldtable where oldtable.id = newtable.id)

If there is more than one set of unique keys, you can add additional not exists statements.

EDIT:

For the revised problem:

insert into oldtable(. . .)
    select . . .
    from (select nt.*, row_number() over (partition by id order by (select null)) as seqnum
          from newtable nt
         ) nt
    where seqnum = 1 and
          not exists (select 1 from oldtable where oldtable.id = nt.id);

The row_number() function assigns a sequential number to each row within a group of rows. The group is defined by the partition by statement. The numbers start at 1 and increment from there. The order by clause says that you don't care about the order. Exactly one row with each id will have a value of 1. Duplicate rows will have a value larger than one. The seqnum = 1 chooses exactly one row per id.

like image 119
Gordon Linoff Avatar answered Nov 15 '22 00:11

Gordon Linoff


If you are on SQL Server 2008+, you can use MERGE to do an INSERT if row does not exist, or an UPDATE.

Example:

MERGE
INTO    dataValue dv
USING   tmp_holding_DataValue t
ON      t.dateStamp = dv.dateStamp
        AND t.itemId = dv.itemId
WHEN NOT MATCHED THEN
INSERT  (dateStamp, itemId, value)
VALUES  (dateStamp, itemId, value)
like image 20
Barb C. Goldstein Avatar answered Nov 15 '22 00:11

Barb C. Goldstein