I've got a simple queue implementation in MS Sql Server 2008 R2. Here's the essense of the queue:
CREATE TABLE ToBeProcessed
(
Id BIGINT IDENTITY(1,1) PRIMARY KEY NOT NULL,
[Priority] INT DEFAULT(100) NOT NULL,
IsBeingProcessed BIT default (0) NOT NULL,
SomeData nvarchar(MAX) NOT null
)
I want to atomically select the top n rows ordered by the priority and the id where IsBeingProcessed is false and update those rows to say they are being processed. I thought I'd use a combination of Update, Top, Output and Order By but unfortunately you can't use top and order by in an Update statement.
So I've made an in clause to restrict the update and that sub query does the order by (see below). My question is, is this whole statement atomic, or do I need to wrap it in a transaction?
DECLARE @numberToProcess INT = 2
CREATE TABLE #IdsToProcess
(
Id BIGINT NOT null
)
UPDATE
ToBeProcessed
SET
ToBeProcessed.IsBeingProcessed = 1
OUTPUT
INSERTED.Id
INTO
#IdsToProcess
WHERE
ToBeProcessed.Id IN
(
SELECT TOP(@numberToProcess)
ToBeProcessed.Id
FROM
ToBeProcessed
WHERE
ToBeProcessed.IsBeingProcessed = 0
ORDER BY
ToBeProcessed.Id,
ToBeProcessed.Priority DESC)
SELECT
*
FROM
#IdsToProcess
DROP TABLE #IdsToProcess
Here's some sql to insert some dummy rows:
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
INSERT INTO ToBeProcessed (SomeData) VALUES (N'');
A subquery can return atomic values (one column with one row in it – also known as a scalar query), row values (one row for one or many columns), or table values (one or many rows for one or many columns).
Like SELECT , the UPDATE statement can have a subquery in several places or clauses. In an UPDATE , the two clauses in which subqueries are used most commonly are SET and WHERE . The SET clause is where we define the new value for the column being modified by the UPDATE .
A subquery is a query that is nested inside a SELECT , INSERT , UPDATE , or DELETE statement, or inside another subquery.
If I understand the motivation for the question you want to avoid the possibility that two concurrent transactions could both execute the sub query to get the top N rows to process then proceed to update the same rows?
In that case I'd use this approach.
;WITH cte As
(
SELECT TOP(@numberToProcess)
*
FROM
ToBeProcessed WITH(UPDLOCK,ROWLOCK,READPAST)
WHERE
ToBeProcessed.IsBeingProcessed = 0
ORDER BY
ToBeProcessed.Id,
ToBeProcessed.Priority DESC
)
UPDATE
cte
SET
IsBeingProcessed = 1
OUTPUT
INSERTED.Id
INTO
#IdsToProcess
I was a bit uncertain earlier whether SQL Server would take U
locks when processing your version with the sub query thus blocking two concurrent transactions from reading the same TOP N
rows. This does not appear to be the case.
CREATE TABLE JobsToProcess
(
priority INT IDENTITY(1,1),
isprocessed BIT ,
number INT
)
INSERT INTO JobsToProcess
SELECT TOP (1000000) 0,0
FROM master..spt_values v1, master..spt_values v2
BEGIN TRY
DECLARE @FinishedMessage VARBINARY (128) = CAST('TestFinished' AS VARBINARY (128))
DECLARE @SynchMessage VARBINARY (128) = CAST('TestSynchronising' AS VARBINARY (128))
SET CONTEXT_INFO @SynchMessage
DECLARE @OtherSpid int
WHILE(@OtherSpid IS NULL)
SELECT @OtherSpid=spid
FROM sys.sysprocesses
WHERE context_info=@SynchMessage and spid<>@@SPID
SELECT @OtherSpid
DECLARE @increment INT = @@spid
DECLARE @number INT = @increment
WHILE (@number = @increment AND NOT EXISTS(SELECT * FROM sys.sysprocesses WHERE context_info=@FinishedMessage))
UPDATE JobsToProcess
SET @number=number +=@increment,isprocessed=1
WHERE priority = (SELECT TOP 1 priority
FROM JobsToProcess
WHERE isprocessed=0
ORDER BY priority DESC)
SELECT *
FROM JobsToProcess
WHERE number not in (0,@OtherSpid,@@spid)
SET CONTEXT_INFO @FinishedMessage
END TRY
BEGIN CATCH
SET CONTEXT_INFO @FinishedMessage
SELECT ERROR_MESSAGE(), ERROR_NUMBER()
END CATCH
Almost immediately execution stops as both concurrent transactions update the same row so the S
locks taken whilst identifying the TOP 1 priority
must get released before it aquires a U
lock then the 2 transactions proceed to get the row U
and X
lock in sequence.
If a CI is added ALTER TABLE JobsToProcess ADD PRIMARY KEY CLUSTERED (priority)
then deadlock occurs almost immediately instead as in this case the row S
lock doesn't get released, one transaction aquires a U
lock on the row and waits to convert it to an X
lock and the other transaction is still waiting to convert its S
lock to a U
lock.
If the query above is changed to use MIN
rather than TOP
WHERE priority = (SELECT MIN(priority)
FROM JobsToProcess
WHERE isprocessed=0
)
Then SQL Server manages to completely eliminate the sub query from the plan and takes U
locks all the way.
Every individual T-SQL statement is, according to all my experience and all the documenation I've ever read, supposed to be atomic. What you have there is a single T-SQL statement, ergo is should be atomic and will not require explicit transaction statements. I've used this precise kind of logic many times, and never had a problem with it. I look forward to seeing if anyone as a supportable alternate opinion.
Incidentally, look into the ranking functions, specifically row_number(), for retrieving a set number of items. The syntax is perhaps a tad awkward, but overall they are flexible and powerful tools. (There are about a bazillion Stack Overlow questions and answers that discuss them.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With