I am currently working on a SSIS package that is extracting a table from one database to another. The tables in both of the databases use the same column as the primary key. My select statement to extract the data is a simple select statement. When I ran the package I was receiving a error that there where duplicate primary key values.
I reviewed my select statement and verified that my select statement was not returning duplicate rows. So to test this I removed the primary key from the table that I am inserting the data into and reran the SSIS package. After it ran I looked at the table to see what rows where being duplicated. What I found was that rows that where being edited while the extract was running where being duplicated, there was a record of it before the edit, and a record of it after the edit. I could easily tell this because the table has a last modified field that gets updated every time a record is updated.
I added a NOLOCK hint to my select statement, and it stopped returning duplicate rows.
So my question is why? I would have expected that a select statement with a NOLOCK table hint would have a higher chance of returning duplicate rows because it is not using locking, and that a select statement without the NOLOCK hint should use locking to make sure that it does not return duplicate rows.
Here is the select statement that I am using to select the data. I did verify that the joins are not causing it to duplicate rows:
SELECT pe.enc_id,
pe.enc_nbr,
pe.billable_ind,
pe.clinical_ind AS clinical_ind,
pe.budget_ind,
pe.print_stmt_ind,
pe.send_coll_letter_ind,
pe.outsource_exempt_ind,
cb.First_name + ' ' + cb.last_name AS CreatedBy,
pe.create_timestamp AS create_timestamp,
mb.first_name + ' ' + mb.last_name AS ModifiedBy,
pe.modify_timestamp AS modify_timestamp
FROM patient_encounter pe WITH(NOLOCK)
LEFT OUTER JOIN user_mstr cb WITH(NOLOCK) ON
pe.created_by = cb.user_id
LEFT OUTER JOIN user_mstr mb WITH(NOLOCK) ON
pe.modified_by = mb.user_id
NOLOCK hint causes dirty read anomallies, and one such anomaly is a duplicate read. Such reads are frequent if an update changes the position of the row in the index scanned by the query:
So the SELECT has read a row twice, once while it had the key value 1 and once while it had a key value 3. This is just a trivial example of what can happen. In reality more complicated queries can run complex plans and use other indexes, all subject to such anomalies.
In short: NOLOCK hint is evil. If you want to avoid contention, use snapshot isolation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With