Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

T-SQL Is a sub query for an Update restriction Atomic with the update?

I've got a simple queue implementation in MS Sql Server 2008 R2. Here's the essense of the queue:

CREATE TABLE ToBeProcessed 
(
    Id BIGINT IDENTITY(1,1) PRIMARY KEY NOT NULL,
    [Priority] INT DEFAULT(100) NOT NULL,
    IsBeingProcessed BIT default (0) NOT NULL,
    SomeData nvarchar(MAX) NOT null
)

I want to atomically select the top n rows ordered by the priority and the id where IsBeingProcessed is false and update those rows to say they are being processed. I thought I'd use a combination of Update, Top, Output and Order By but unfortunately you can't use top and order by in an Update statement.

So I've made an in clause to restrict the update and that sub query does the order by (see below). My question is, is this whole statement atomic, or do I need to wrap it in a transaction?

DECLARE @numberToProcess INT = 2

CREATE TABLE #IdsToProcess
(
    Id BIGINT NOT null
)

UPDATE 
    ToBeProcessed
SET
    ToBeProcessed.IsBeingProcessed = 1
OUTPUT 
    INSERTED.Id 
INTO
    #IdsToProcess   
WHERE
    ToBeProcessed.Id IN 
    (
        SELECT TOP(@numberToProcess) 
            ToBeProcessed.Id 
        FROM 
            ToBeProcessed 
        WHERE
            ToBeProcessed.IsBeingProcessed = 0
        ORDER BY 
            ToBeProcessed.Id, 
            ToBeProcessed.Priority DESC)

SELECT 
    *
FROM 
    #IdsToProcess

DROP TABLE #IdsToProcess

Here's some sql to insert some dummy rows:

INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
like image 668
Daniel James Bryars Avatar asked Nov 04 '10 13:11

Daniel James Bryars


People also ask

Is subquery Atomic?

A subquery can return atomic values (one column with one row in it – also known as a scalar query), row values (one row for one or many columns), or table values (one or many rows for one or many columns).

Is subquery allowed in update statement?

Like SELECT , the UPDATE statement can have a subquery in several places or clauses. In an UPDATE , the two clauses in which subqueries are used most commonly are SET and WHERE . The SET clause is where we define the new value for the column being modified by the UPDATE .

What is SQL Server subquery?

A subquery is a query that is nested inside a SELECT , INSERT , UPDATE , or DELETE statement, or inside another subquery.


2 Answers

If I understand the motivation for the question you want to avoid the possibility that two concurrent transactions could both execute the sub query to get the top N rows to process then proceed to update the same rows?

In that case I'd use this approach.

;WITH cte As
(
SELECT TOP(@numberToProcess) 
            *
        FROM 
            ToBeProcessed WITH(UPDLOCK,ROWLOCK,READPAST) 
        WHERE
            ToBeProcessed.IsBeingProcessed = 0
        ORDER BY 
            ToBeProcessed.Id, 
            ToBeProcessed.Priority DESC
)            
UPDATE 
    cte
SET
    IsBeingProcessed = 1
OUTPUT 
    INSERTED.Id 
INTO
    #IdsToProcess  

I was a bit uncertain earlier whether SQL Server would take U locks when processing your version with the sub query thus blocking two concurrent transactions from reading the same TOP N rows. This does not appear to be the case.

Test Table

CREATE TABLE JobsToProcess
(
priority INT IDENTITY(1,1),
isprocessed BIT ,
number INT
)

INSERT INTO JobsToProcess
SELECT TOP (1000000) 0,0
FROM master..spt_values v1, master..spt_values v2

Test Script (Run in 2 concurrent SSMS sessions)

BEGIN TRY
DECLARE @FinishedMessage VARBINARY (128) = CAST('TestFinished' AS  VARBINARY (128))
DECLARE @SynchMessage VARBINARY (128) = CAST('TestSynchronising' AS  VARBINARY (128))
SET CONTEXT_INFO @SynchMessage

DECLARE @OtherSpid int

WHILE(@OtherSpid IS NULL)
SELECT @OtherSpid=spid 
FROM sys.sysprocesses 
WHERE context_info=@SynchMessage and spid<>@@SPID

SELECT @OtherSpid


DECLARE @increment INT = @@spid
DECLARE @number INT = @increment

WHILE (@number = @increment AND NOT EXISTS(SELECT * FROM sys.sysprocesses WHERE context_info=@FinishedMessage))
UPDATE JobsToProcess 
SET @number=number +=@increment,isprocessed=1
WHERE priority = (SELECT TOP 1 priority 
                   FROM JobsToProcess 
                   WHERE isprocessed=0 
                   ORDER BY priority DESC)

SELECT * 
FROM JobsToProcess 
WHERE number not in (0,@OtherSpid,@@spid)
SET CONTEXT_INFO @FinishedMessage
END TRY
BEGIN CATCH
SET CONTEXT_INFO @FinishedMessage
SELECT ERROR_MESSAGE(), ERROR_NUMBER()
END CATCH

Almost immediately execution stops as both concurrent transactions update the same row so the S locks taken whilst identifying the TOP 1 priority must get released before it aquires a U lock then the 2 transactions proceed to get the row U and X lock in sequence.

Heap

If a CI is added ALTER TABLE JobsToProcess ADD PRIMARY KEY CLUSTERED (priority) then deadlock occurs almost immediately instead as in this case the row S lock doesn't get released, one transaction aquires a U lock on the row and waits to convert it to an X lock and the other transaction is still waiting to convert its S lock to a U lock.

Clustered Index

If the query above is changed to use MIN rather than TOP

WHERE priority = (SELECT MIN(priority)
                   FROM JobsToProcess 
                   WHERE isprocessed=0 
                   )

Then SQL Server manages to completely eliminate the sub query from the plan and takes U locks all the way.

enter image description here

like image 84
Martin Smith Avatar answered Sep 25 '22 03:09

Martin Smith


Every individual T-SQL statement is, according to all my experience and all the documenation I've ever read, supposed to be atomic. What you have there is a single T-SQL statement, ergo is should be atomic and will not require explicit transaction statements. I've used this precise kind of logic many times, and never had a problem with it. I look forward to seeing if anyone as a supportable alternate opinion.

Incidentally, look into the ranking functions, specifically row_number(), for retrieving a set number of items. The syntax is perhaps a tad awkward, but overall they are flexible and powerful tools. (There are about a bazillion Stack Overlow questions and answers that discuss them.)

like image 23
Philip Kelley Avatar answered Sep 21 '22 03:09

Philip Kelley