Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would you architect this message processing system in .NET/SQL Server?

Let's say I've got a SQL Server database table with X (> 1,000,000) records in it that need to be processed (get data, perform external action, update status in db) one-by-one by some worker processes (either console apps, windows service, Azure worker roles, etc). I need to guarantee each row is only processed once. Ideally exclusivity would be guaranteed no matter how many machines/processes were spun up to process the messages. I'm mostly worried about two SELECTs grabbing the same rows simultaneously.

I know there are better datastores for queuing out there, but I don't have that luxury for this project. I have ideas for accomplishing this, but I'm looking for more.

like image 246
John Sheehan Avatar asked Sep 21 '10 22:09

John Sheehan


Video Answer


1 Answers

I've had this situation.

Add an InProcess column to the table, default = 0. In the consumer process:

UPDATE tbl SET Inprocess = @myMachineID WHERE rowID = 
    (SELECT MIN(rowID) WHERE InProcess = 0)

Now that machine owns the row, and you can query its data without fear. Usually your next line will be something like this:

SELECT * FROM tbl WHERE rowID = 
    (SELECT MAX(rowID) FROM tbl WHERE ProcessID = @myMachineID)

You'll also have to add a Done flag of some kind to the row, so you can tell if the row was claimed but processing was incomplete.

Edit

The UPDATE gets an exclusive lock (see MSDN). I'm not sure if the SELECT in the subquery is allowed to be split from the UPDATE; if so, you'd have to put them in a transaction.

@Will A posts a link which suggests that beginning your batch with this will guarantee it:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

...but I haven't tried it.

@Martin Smith's link also makes some good points, looking at the OUTPUT clause (added in SQL 2005).

One last edit

Very interesting exchange in the comments, I definitely learned a few things here. And that's what SO is for, right?

Just for color: when I used this approach back in 2004, I had a bunch of web crawlers dumping URLs-to-search into a table, then pulling their next URL-to-crawl from that same table. Since the crawlers were attempting to attract malware, they were liable to crash at any moment.

like image 62
egrunin Avatar answered Nov 15 '22 10:11

egrunin