Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can NOLOCK cause DISTINCT to fail?

Tags:

sql-server

DISCLAIMER: This is not a general question about the problems of NOLOCK (and therefore not a duplicate of Is the NOLOCK (Sql Server hint) bad practice?); it is a specific question about how NOLOCK and DISTINCT interact in an attempt to better understand SQL Server's inner workings.

As strange as it may seem, it appears to me that NOLOCK may be causing DISTINCT to fail in a certain case. Here is the example:

INSERT INTO TableA (ID)
SELECT DISTINCT ID
FROM TableB WITH (NOLOCK)

The above example occasionally produces a PK violation. Here are the other relevant facts:

  • PK of TableA is ID
  • PK of TableB is ID
  • TableA is empty when this starts.
  • Nothing else is writing to TableA during this time.
  • There are updates happening to TableB while the above is running.

My working theory is that 1) the updates on TableB combined with the use of NOLOCK are causing duplicate data, and 2) the optimizer is relying on the fact that TableA has a PK on the same column that we are DISTINCTing, and so doesn't actively perform a DISTINCT operation on the rows that are being returned, it just assumes the rows will already be distinct.

Can anyone confirm this? And if so, is this by design, or a bug in SQL Server?

I originally thought even with dirty reads and the possibility of duplicate rows that DISTINCT would be a guarantee to clean up the duplicates, but the evidence I'm seeing seems to indicate otherwise.

This error was seen on SQL Server 2008R2.

like image 429
JohnnyM Avatar asked Oct 19 '17 17:10

JohnnyM


People also ask

What are the disadvantages of Nolock?

The benefits of querying data using the NOLOCK table hint is that it requires less memory and prevents deadlocks from occurring with any other queries that may be reading similar data. The only drawback is that using the NOLOCK table hint may accidentally result into reading uncommitted “dirty” data.

Can Nolock cause blocking?

Schema Change Blocking with NOLOCK Since a NOLOCK hint needs to get a Sch-S (schema stability) lock, a SELECT using NOLOCK could still be blocked if a table is being altered and not committed.

Should I always use Nolock?

The WITH (NOLOCK) table hint is a good idea when the system uses explicit transactions heavily, which blocks the data reading very frequently. The WITH (NOLOCK) table hint is used when working with systems that accept out of sync data, such as the reporting systems.

Does with Nolock improve performance?

With heavy-traffic data sources, this can potentially offer significant performance improvements by not requiring queries to wait for SQL to release locks on a table. Certain database can temporarily post non-committed transations to a database table and then remove them.


1 Answers

Sure this can happen. The engine is smart enough to know that since ID is your primary key it isn't going to waste resources looking for duplicates. However, you have introduced the dreaded NOLOCK hint. And you said that TableB is being updated during this process.

What you are almost certainly experiencing here is one the side affects of NOLOCK brought on by page splits. These page splits can cause the engine to return duplicate rows and as I said before the engine assumes you have no duplicates because you are selecting the primary key and there can't be duplicates. This is NOT a bug in sql server, it is yet another reason so stop using the hint.

like image 162
Sean Lange Avatar answered Oct 16 '22 05:10

Sean Lange