Given a table that is acting as a queue, how can I best configure the table/queries so that multiple clients process from the queue concurrently? For example, the table below indicates a command that a worker must process. When the worker is done, it will set the processed value to true. <pre class="prettyprint"><code>| ID | COMMAND | PROCESSED | | 1 | ... | true | | 2 | ... | false | | 3 | ... | false | </code></pre> The clients might obtain one command to work on like so: <pre class="prettyprint"><code>select top 1 COMMAND from EXAMPLE_TABLE with (UPDLOCK, ROWLOCK) where PROCESSED=false; </code></pre> However, if there are multiple workers, each tries to get the row with ID=2. Only the first will get the pessimistic lock, the rest will wait. Then one of them will get row 3, etc. What query/configuration would allow each worker client to get a different row each and work on them concurrently? EDIT: Several answers suggest variations on using the table itself to record an in-process state. I thought that this would not be possible within a single transaction. (i.e., what's the point of updating the state if no other worker will see it until the txn is committed?) Perhaps the suggestion is: <pre class="prettyprint"><code># start transaction update to 'processing' # end transaction # start transaction process the command update to 'processed' # end transaction </code></pre> Is this the way people usually approach this problem? It seems to me that the problem would be better handled by the DB, if possible.

I recommend you go over Using tables as Queues. Properly implemented queues can handle thousands of concurrent users and service as high as 1/2 Million enqueue/dequeue operations per minute. Until SQL Server 2005 the solution was cumbersome and involved a mixing a <code>SELECT</code> and an <code>UPDATE</code> in a single transaction and give just the right mix of lock hints, as in the article linked by gbn. Luckly since SQL Server 2005 with the advent of the OUTPUT clause, a much more elegant solution is available, and now MSDN recommends using the OUTPUT clause: <blockquote> You can use OUTPUT in applications that use tables as queues, or to hold intermediate result sets. That is, the application is constantly adding or removing rows from the table </blockquote> Basically there are 3 parts of the puzzle you need to get right in order for this to work in a highly concurrent manner: <ol> <li>You need to dequeue automically. You have to find the row, skip any locked rows, and mark it as 'dequeued' in a single, atomic operation, and this is where the <code>OUTPUT</code> clause comes into play:</li> </ol> <pre class="prettyprint"><code> with CTE as ( SELECT TOP(1) COMMAND, PROCESSED FROM TABLE WITH (READPAST) WHERE PROCESSED = 0) UPDATE CTE SET PROCESSED = 1 OUTPUT INSERTED.*; </code></pre> <ol start="2"> <li>You must structure your table with the leftmost clustered index key on the <code>PROCESSED</code> column. If the <code>ID</code> was used a primary key, then move it as the second column in the clustered key. The debate whether to keep a non-clustered key on the <code>ID</code> column is open, but I strongly favor not having any secondary non-clustered indexes over queues:</li> </ol> <pre class="prettyprint"><code> CREATE CLUSTERED INDEX cdxTable on TABLE(PROCESSED, ID); </code></pre> <ol start="3"> <li>You must not query this table by any other means but by Dequeue. Trying to do Peek operations or trying to use the table both as a Queue and as a store will very likely lead to deadlocks and will slow down throughput dramatically.</li> </ol> The combination of atomic dequeue, READPAST hint at searching elements to dequeue and leftmost key on the clustered index based on the processing bit ensure a very high throughput under a highly concurrent load.

Using SQL Server as a DB queue with multiple clients

Tags:

sql

database

sql-server

concurrency

Given a table that is acting as a queue, how can I best configure the table/queries so that multiple clients process from the queue concurrently?

For example, the table below indicates a command that a worker must process. When the worker is done, it will set the processed value to true.

| ID | COMMAND | PROCESSED | |  1 | ...     | true      | |  2 | ...     | false     | |  3 | ...     | false     |

The clients might obtain one command to work on like so:

select top 1 COMMAND  from EXAMPLE_TABLE  with (UPDLOCK, ROWLOCK)  where PROCESSED=false;

However, if there are multiple workers, each tries to get the row with ID=2. Only the first will get the pessimistic lock, the rest will wait. Then one of them will get row 3, etc.

What query/configuration would allow each worker client to get a different row each and work on them concurrently?

EDIT:

Several answers suggest variations on using the table itself to record an in-process state. I thought that this would not be possible within a single transaction. (i.e., what's the point of updating the state if no other worker will see it until the txn is committed?) Perhaps the suggestion is:

# start transaction update to 'processing' # end transaction # start transaction process the command update to 'processed' # end transaction

Is this the way people usually approach this problem? It seems to me that the problem would be better handled by the DB, if possible.

498

asked Sep 04 '10 09:09

Synesso

1 Answers

I recommend you go over Using tables as Queues. Properly implemented queues can handle thousands of concurrent users and service as high as 1/2 Million enqueue/dequeue operations per minute. Until SQL Server 2005 the solution was cumbersome and involved a mixing a SELECT and an UPDATE in a single transaction and give just the right mix of lock hints, as in the article linked by gbn. Luckly since SQL Server 2005 with the advent of the OUTPUT clause, a much more elegant solution is available, and now MSDN recommends using the OUTPUT clause:

You can use OUTPUT in applications that use tables as queues, or to hold intermediate result sets. That is, the application is constantly adding or removing rows from the table

Basically there are 3 parts of the puzzle you need to get right in order for this to work in a highly concurrent manner:

You need to dequeue automically. You have to find the row, skip any locked rows, and mark it as 'dequeued' in a single, atomic operation, and this is where the OUTPUT clause comes into play:

    with CTE as (       SELECT TOP(1) COMMAND, PROCESSED       FROM TABLE WITH (READPAST)       WHERE PROCESSED = 0)     UPDATE CTE       SET PROCESSED = 1       OUTPUT INSERTED.*;

You must structure your table with the leftmost clustered index key on the PROCESSED column. If the ID was used a primary key, then move it as the second column in the clustered key. The debate whether to keep a non-clustered key on the ID column is open, but I strongly favor not having any secondary non-clustered indexes over queues:

    CREATE CLUSTERED INDEX cdxTable on TABLE(PROCESSED, ID);

You must not query this table by any other means but by Dequeue. Trying to do Peek operations or trying to use the table both as a Queue and as a store will very likely lead to deadlocks and will slow down throughput dramatically.

The combination of atomic dequeue, READPAST hint at searching elements to dequeue and leftmost key on the clustered index based on the processing bit ensure a very high throughput under a highly concurrent load.

189

answered Sep 25 '22 06:09

Remus Rusanu

Related questions
                            
                                SQLite: COUNT slow on big tables
                            
                                sqlalchemy,creating an sqlite database if it doesn't exist
                            
                                Is there any general rule on SQL query complexity Vs performance?
                            
                                Postgres data type cast
                            
                                WHERE col1,col2 IN (...) [SQL subquery using composite primary key]
                            
                                Where to see the logged sql statements in play2?
                            
                                Drop foreign key only if it exists
                            
                                SQL ignore part of WHERE if parameter is null
                            
                                sql query to get earliest date
                            
                                usage of select null?
                            
                                ORA-00918: column ambiguously defined in SELECT *
                            
                                Conversion failed when converting the varchar value 'simple, ' to data type int
                            
                                Query in MySQL for string fields with a specific length
                            
                                IF EXISTS condition not working with PLSQL
                            
                                How to group by a Calculated Field
                            
                                What is dynamic SQL?
                            
                                How to find values in all caps in SQL Server?
                            
                                Extracting the total number of seconds from an interval data-type
                            
                                How to detect if a string contains special characters?
                            
                                Group by alias (Oracle)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With