Can NOLOCK cause DISTINCT to fail?

Tags:

sql-server

DISCLAIMER: This is not a general question about the problems of NOLOCK (and therefore not a duplicate of Is the NOLOCK (Sql Server hint) bad practice?); it is a specific question about how NOLOCK and DISTINCT interact in an attempt to better understand SQL Server's inner workings.

As strange as it may seem, it appears to me that NOLOCK may be causing DISTINCT to fail in a certain case. Here is the example:

INSERT INTO TableA (ID)
SELECT DISTINCT ID
FROM TableB WITH (NOLOCK)

The above example occasionally produces a PK violation. Here are the other relevant facts:

PK of TableA is ID
PK of TableB is ID
TableA is empty when this starts.
Nothing else is writing to TableA during this time.
There are updates happening to TableB while the above is running.

My working theory is that 1) the updates on TableB combined with the use of NOLOCK are causing duplicate data, and 2) the optimizer is relying on the fact that TableA has a PK on the same column that we are DISTINCTing, and so doesn't actively perform a DISTINCT operation on the rows that are being returned, it just assumes the rows will already be distinct.

Can anyone confirm this? And if so, is this by design, or a bug in SQL Server?

I originally thought even with dirty reads and the possibility of duplicate rows that DISTINCT would be a guarantee to clean up the duplicates, but the evidence I'm seeing seems to indicate otherwise.

This error was seen on SQL Server 2008R2.

429

asked Oct 19 '17 17:10

JohnnyM

1 Answers

Sure this can happen. The engine is smart enough to know that since ID is your primary key it isn't going to waste resources looking for duplicates. However, you have introduced the dreaded NOLOCK hint. And you said that TableB is being updated during this process.

What you are almost certainly experiencing here is one the side affects of NOLOCK brought on by page splits. These page splits can cause the engine to return duplicate rows and as I said before the engine assumes you have no duplicates because you are selecting the primary key and there can't be duplicates. This is NOT a bug in sql server, it is yet another reason so stop using the hint.

162

answered Oct 16 '22 05:10

Sean Lange

Related questions
                            
                                Is there any way to find a specific value in every field of every table in Microsoft SQL Server?
                            
                                TinyTDS: Server name not found in configuration files
                            
                                SQL IF ELSE performance issue
                            
                                Will a row always have same value for %%physloc%% in SQL server?
                            
                                how to download a database to local
                            
                                SQL count empty cells in unknown number of columns
                            
                                Query for Set Cover
                            
                                Preventing deadlocks in SQL Server
                            
                                Multiple full text indexes on same table
                            
                                SQL Server 2016: Hide column data from DBAs but specific users can view data through application
                            
                                SSIS Flat File error "Text was truncated or one or more characters had no match in the target code page."
                            
                                Why is the official SQL Server Express container only meant for development and testing?
                            
                                How can I use a dot/period in a column name with FOR JSON PATH without it creating a nested object?
                            
                                Divide rows with date in SQL Server 2014
                            
                                How do I insert random characters into a sql database column?
                            
                                Error: The type of the value (DBNull) being assigned to variable "User:: differs from the current variable type (String)
                            
                                Oracle OLE DB Provider not Listed in SSIS
                            
                                SQL Server R multiple result sets
                            
                                SQL group by: select value where another column has its min/max
                            
                                SQL query to count a column in all tables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With