Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel processing of database queue

There is small system, where a database table as queue on MSSQL 2005. Several applications are writing to this table, and one application is reading and processing in a FIFO manner.

I have to make it a little bit more advanced to be able to create a distributed system, where several processing application can run. The result should be that 2-10 processing application should be able to run and they should not interfere each other during work.

My idea is to extend the queue table with a row showing that a process is already working on it. The processing application will first update the table with it's idetifyer, and then asks for the updated records.

So something like this:

start transaction
update top(10) queue set processing = 'myid' where processing is null
select * from processing where processing = 'myid'
end transaction

After processing, it sets the processing column of the table to something else, like 'done', or whatever.

I have three questions about this approach.

First: can this work in this form?

Second: if it is working, is it effective? Do you have any other ideas to create such a distribution?

Third: In MSSQL the locking is row based, but after an amount of rows are locked, the lock is extended to the whole table. So the second application cannot access it, until the first application does not release the transaction. How big can be the selection (top x) in order to not lock the whole table, only create row locks?

like image 249
Biri Avatar asked Jan 24 '23 00:01

Biri


2 Answers

This will work, but you'll probably find you'll run into blocking or deadlocks where multiple processes try and read/update the same data. I wrote a procedure to do exactly this for one of our systems which uses some interesting locking semantics to ensure this type of thing runs with no blocking or deadlocks, described here.

like image 121
Greg Beech Avatar answered Feb 04 '23 17:02

Greg Beech


This approach looks reasonable to me, and is similar to one I have used in the past - successfully.

Also, the row/ table will only be locked while the update and select operations take place, so I doubt the row vs table question is really a major consideration.

Unless the processing overhead of your app is so low as to be negligible, I'd keep the "top" value low - perhaps just 1. Of course that entirely depends on the details of your app.

Having said all that, I'm not a DBA, and so will also be interested in any more expert answers

like image 40
philsquared Avatar answered Feb 04 '23 18:02

philsquared