Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MERGE INTO insertion order

I have a statement that looks something like this:

MERGE INTO someTable st
USING
(
    SELECT id,field1,field2,etc FROM otherTable
) ot on st.field1=ot.field1
WHEN NOT MATCHED THEN
    INSERT (field1,field2,etc)
    VALUES (ot.field1,ot.field2,ot.etc)

where otherTable has an autoincrementing id field.

I would like the insertion into someTable to be in the same order as the id field of otherTable, such that the order of ids is preserved when the non-matching fields are inserted.

A quick look at the docs would appear to suggest that there is no feature to support this.

Is this possible, or is there another way to do the insertion that would fulfil my requirements?

EDIT: One approach to this would be to add an additional field to someTable that captures the ordering. I'd rather not do this if possible.

... upon reflection the approach above seems like the way to go.

like image 278
spender Avatar asked May 19 '09 17:05

spender


People also ask

Can we use order by in MERGE statement?

You cannot use the ORDER BY clause alone because the ORDER BY clause uses a SQL Server collation to sort string values. The use of the SQL Server collation might result in a different sort order than Windows collation, which can cause the Merge or Merge Join transformation to produce unexpected results.

Which is faster MERGE or insert?

Answer. Testing with a variety of source row sets against a target with about 6 mio. rows showed a slighty time advance using the merge command. Overall less internal steps are performed in the merge compared to delete/insert.

Is MERGE a DML statement?

This statement is a convenient way to combine multiple operations. It lets you avoid multiple INSERT , UPDATE , and DELETE DML statements. MERGE is a deterministic statement. That is, you cannot update the same row of the target table multiple times in the same MERGE statement.

Does SQL insert in order?

The only thing that order by on an insert is guaranteed to do is assign the values of an identity column if one exists. Your select has no order by, hence SQL is in no way required to return the data in any particular order.


2 Answers

I cannot speak to what the Questioner is asking for here because it doesn't make any sense.

So let's assume a different problem:

Let's say, instead, that I have a Heap-Table with no Identity-Field, but it does have a "Visited" Date field.
The Heap-Table logs Person WebPage Visits and I'm loading it into my Data Warehouse.
In this Data Warehouse I'd like to use the Surrogate-Key "WebHitID" to reference these relationships.
Let's use Merge to do the initial load of the table, then continue calling it to keep the tables in sync.

I know that if I'm inserting records into an table, then I'd prefer the ID's (that are being generated by an Identify-Field) to be sequential based on whatever Order-By I choose (let's say the "Visited" Date).
It is not uncommon to expect an Integer-ID to correlate to when it was created relative to the rest of the records in the table.
I know this is not always 100% the case, but humor me for a moment.

This is possible with Merge.

Using (what feels like a hack) TOP will allow for Sorting in our Insert:

MERGE DW.dbo.WebHit AS Target --This table as an Identity Field called WebHitID.
USING
(
    SELECT TOP 9223372036854775807 --Biggest BigInt (to be safe).
           PWV.PersonID, PWV.WebPageID, PWV.Visited
      FROM ProdDB.dbo.Person_WebPage_Visit AS PWV
     ORDER BY PWV.Visited --Works only with TOP when inside a MERGE statement.
) AS Source
  ON Source.PersonID  = Target.PersonID
 AND Source.WebPageID = Target.WebPageID
 AND Source.Visited   = Target.Visited
WHEN NOT MATCHED BY Target THEN --Not in Target-Table, but in Source-Table.
    INSERT (PersonID, WebPageID, Visited) --This Insert populates our WebHitID.
    VALUES (Source.PersonID, Source.WebPageID, Source.Visited)
WHEN NOT MATCHED BY Source THEN --In Target-Table, but not in Source-Table.
    DELETE --In case our WebHit log in Prod is archived/trimmed to save space.
;


You can see I opted to use TOP 9223372036854775807 (the biggest Integer there is) to pull everything.
If you have the resources to merge more than that, then you should be chunking it out.
While this screams "hacky workaround" to me, it should get you where you need to go.

I have tested this on a small sample set and verified it works. I have not studied the performance impact of it on larger complex sets of data though, so YMMV with and without the TOP.

like image 157
MikeTeeVee Avatar answered Sep 24 '22 01:09

MikeTeeVee


Following up on MikeTeeVee's answer.

Using TOP will allow you to Order By within a sub-query, however instead of TOP 9223372036854775807, I would go with

SELECT TOP 100 PERCENT 

Unlikely to reach that number, but this way just makes more sense and looks cleaner.

like image 31
MMartin Avatar answered Sep 23 '22 01:09

MMartin