Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this a 'thread-safe' way to update and return a row in SQL from a queue?

I'm implementing a queue in SQL by having a table that has a 'claimedby' column. Any one of a number of processes can grab an item from this queue by populating the claimedby field.

What I want to do is prevent two processes grabbing the same item.

I've done some research and found a few possibilities of how to do this. I don't want to use the broker, I do want to stick with a simple table.

I am using C# to work on the item that is retrieved.

I've been playing around in SQL and written the following:

declare @test table (ID int);

UPDATE TOP (1) CatQueueItem SET ClaimedBy = 'Mittens'
OUTPUT inserted.CatQueueItemIdId into @test
WHERE ClaimedBy is null

This will be done as a single query in C# by adding the parameter as an output parameter, etc.

I'm wondering if this code will work or if I need to think about locking and transanctions to ensure that if multiple processes run this query simultaneously that it will function as expected (only one process will claim the item, the other updates will skip this row entirely due to it being updated by the first).

Any ideas?

like image 876
NibblyPig Avatar asked Mar 25 '15 17:03

NibblyPig


People also ask

Is SQL thread safe?

sql. Connection interface and is not thread-safe, according to its Javadoc: SQLServerConnection is not thread-safe, however multiple statements created from a single connection can be processing simultaneously in concurrent threads.

Does UPDATE query locks the row?

Think of it this way -- It locks every row it had to look at. No index on the column -- It had to check every row, so all rows are locked. That effectively locks the entire table. UNIQUE index on the column -- Only one row need be touched, hence, locked.

How do I return a row that was updated in SQL?

Use the RETURNING clause. The optional RETURNING clause causes UPDATE to compute and return value(s) based on each row actually updated. Any expression using the table's columns, and/or columns of other tables mentioned in FROM , can be computed. The new (post-update) values of the table's columns are used.


2 Answers

Every query in SQL Server operates within an implicit transaction. Since you are using a single statement here (not multiple queries) the engine will handle locking and blocking for you.

As long as you are handling the case where no record is updated in your C# code this should handle concurrency just fine.

like image 28
JNK Avatar answered Nov 08 '22 19:11

JNK


While the query given will not give a row to multiple threads due to an implicit transaction that is created, it can cause issues where one thread manages to hog the queue. If you want to stop this you might want to add the READ PAST and ROW LOCK hints to the query. This prevents the lock that one thread gets from blocking the other threads from getting a row.

For example:

UPDATE TOP (1) CatQueueItem WITH (READPAST, ROWLOCK)
SET ClaimedBy = 'Mittens'
OUTPUT inserted.CatQueueItemIdId into @test
WHERE ClaimedBy is null

The Long Explanation

All statements in SQL Server run within the context of a Transaction. If a statement is run and there is no transaction specified SQL Server creates an implicit transaction just for that single statement.

There are three main types of locks that SQL Server uses Shared (S), Update (U) and Exclusive (X). Locks are acquired within the scope of a Transaction. If a transaction has an S lock on a row other transactions can get an S or U lock but not an X lock on the same row. If a transaction has a U lock on a row other transactions can only get S locks on that row. If a transaction has an X lock on a row no other transaction can get a lock on the row. In order to write to a row a transaction needs to have an X lock as this has the effect of blocking all other transactions from reading the row while it is part way through updating it.

This gives the following compatibility table:

    S  U  X
  ----------
S | Y  Y  N
U | Y  N  N
X | N  N  N

The update that is in the question has two parts to it. First it does a read of the table to find a row that has a null in ClaimedBy. Once it has found the row the second part of the operation updates the row that was found.

Normally when reading from tables SQL Server uses S locks as these don’t prevent other transactions from also reading the rows and increases read performance, but it does stop other transactions getting X locks to write to the rows. The problem with this is that when the second part of the update query tries to upgrade to an X lock, so it can write to the row, it can cause a deadlock. The reason for this is that the first part of the query in another transaction may have acquired an S lock the same as your transaction but might not have upgraded it yet. This prevents your transaction upgrading its lock to an X lock and your S lock prevents the other transaction upgrading either. Neither of the transactions will be able to succeed so they deadlock. In this case SQL Server picks one transaction and rolls it back.

To stop the deadlock happening, when doing the read part of an update statement SQL Server uses U locks. U locks allow other transactions to acquire S locks allowing those that are doing reads to succeed, but it does not allow other U locks. By using a U lock you are saying that you are only reading, but you have the intention of writing at some point in the future. This prevents the situation where you have two transactions that are both trying to upgrade to an X lock. As such the transaction that has the U lock can upgrade to X safe in the knowledge that it won’t deadlock with another transaction by doing so.

The relevance of all of this to the scenario given in the question is that the U locks used by one thread’s transaction to lock rows while searching for an available row block out all the other thread’s transactions. This is because when a transaction tries to acquire a lock on a row that already has an incompatible lock on it, it simply waits in a queue until the blocking lock is unlocked. Because all the threads are searching the same rows for a free one they all try to get U locks on the same row and all form up into an orderly queue waiting to get a U lock on the same row. In other words only one thread’s transaction is allowed to look for free rows at a time.

What the READPAST table hint does is it stops the transactions queueing to read rows in the table. With READPAST when a transaction tires to acquire a lock on a row that is already locked instead of joining a queue for the lock it says stuff this and goes and tries the next row. In this case it will say I don’t know if the row has a ClaimedBy value or not, I’m not prepared to wait to find out, so I will just assume that it does and try the next row. This could mean that it skips available rows but won’t get an unavailable row. This will improve the speed at which the threads and their transactions can obtain items from the queue as they can all look for available rows at the same time.

Acquiring locks can potentially be quite expensive. It takes time and memory. In order to combat this issue SQL Server has several granularities for locks. You can lock the whole database, a whole table, a page of a table or a row of a table. The query optimizer will try to use statistics to predict how many rows need to be locked. If there are a lot, it will pick a page or table lock instead of a row lock. This has the effect of it requiring fewer locks overall.

The ROWLOCK table hint tells SQL Server to not use these coarser grained locks and only use row locks. This is advantages in this case because it stops big chunks of available rows being skipped by the transactions that are looking for available rows.

like image 54
Martin Brown Avatar answered Nov 08 '22 19:11

Martin Brown