Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server INSERT, Scope_Identity() and physical writing to disc

I have a stored procedure that does, among other stuff, some inserts in different table inside a loop. See the example below for clearer understanding:

INSERT INTO T1 VALUES ('something')

SET @MyID = Scope_Identity()

... some stuff go here

INSERT INTO T2 VALUES (@MyID, 'something else')

... The rest of the procedure

These two tables (T1 and T2) have an IDENTITY(1, 1) column in each one of them, let's call them ID1 and ID2; however, after running the procedure in our production database (very busy database) and having more than 6250 records in each table, I have noticed one incident where ID1 does not match ID2! Although normally for each record inserted in T1, there is record inserted in T2 and the identity column in both is incremented consistently.

The "wrong" records were something like that:

ID1     Col1
----    ---------
4709    data-4709
4710    data-4710

ID2     ID1     Col1
----    ----    ---------
4709    4710    data-4710
4710    4709    data-4709

Note the "inverted", ID1 in the second table.

Knowing not that much about SQL Server underneath operations, I have put the following "theory", maybe someone can correct me on this.

What I think is that because the loop is faster than physically writing to the table, and/or maybe some other thing delayed the writing process, the records were buffered. When it comes the time to write them, they were wrote in no particular order.

Is that even possible if no, how to explain the above mentioned scenario?

If yes, then I have another question to rise. What if the first insert (from the code above) got delayed? Doesn't that mean I won't get the correct IDENTITY to insert into the second table? If the answer of this is also yes, what can I do to insure the insertion in the two tables will happen in sequence with the correct IDENTITY?

I appreciate any comment and information that help me understand this.

Thanks in advance.

like image 227
TheBlueSky Avatar asked Mar 15 '10 10:03

TheBlueSky


3 Answers

There is no way you can rely on IDENTITY to solve this for your second table. If you care about the generated primary key value for that row, you should generate itself.

IDENTITY is a way of saying "I don't want the hassle of generating a key myself, just do it for me, and I'll ask for the generated value if and when I need it".

What could be happening here is that two threads are inserting the rows at the same time, none of them have committed yet, so you get this scenario:

Thread 1                      Thread 2
get id for table 1 = 4709
                              get id for table 1 = 4710
insert row for table 1
                              insert row for table 1
                              get id for table 2 = 4709
get id for table 2 = 4710
                              insert row for table 2
insert row for table 1

You have two ways to solve your problem:

  1. Remove IDENTITY for the primary key in the second table
  2. Use SET IDENTITY_INSERT ON to allow you to provide a key for it, while keeping the IDENTITY setting

In this case, however, I would use method nbr. 1. Method nbr. 2 is usually used when importing data into an empty table. You don't want the risk of the database auto-generating an ID you later on want to use yourself (since it comes from the first table), and so you should disable IDENTITY setting on the primary key of the second table.

Or you could try to avoid relying on the key for that table at all, since you have a foreign key reference, do you really need the key values to be the same?

like image 104
Lasse V. Karlsen Avatar answered Oct 05 '22 14:10

Lasse V. Karlsen


Of course your above scenario is possible - and quite likely, too.

If you have two separate, independent tables, both being used for queries and inserts, both with a separate IDENTITY(1,1) field, there's absolutely no guarantee that an insert into one table and then into the second will be executed in the same order!

If you do need to establish a link between the two, insert the first table's ID into the second table as a foreign key. You cannot rely on the ID's generated from IDENTITY's to be the same in both tables!

like image 28
marc_s Avatar answered Oct 05 '22 14:10

marc_s


Regading writing:

  • Whenever you do something that changes data, this is written to the database LOGS that moment, and you dont get a transaction confirm until this has happened. That is the D in ACID conditions (database theory).
  • Dirty database pages are written to disk "in the background". If too many are dirty, a checkpoint is triggered and they are all dumped out.

So far to the writing part.

Waht you probably run into is simlpy the fact that while individual statements are atomic, a busy atabase has possibly more than one thread running along it. So, basically, a thread switch happened between the statements. One thread got Id1, another one prioerity, id1, id2, then the first one id2.

Nothing specific here ;) Typical normal database behavior when multiple threads run along. Nothing to do with writing per se.

Basically, between SET @MyID = Scope_Identity() and the next statement, another thread can get priority ;)

like image 38
TomTom Avatar answered Oct 05 '22 13:10

TomTom