Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Design Guidelines Distributed computing

Tags:

c#

.net

system

I have a software system that performs OCR on Multiple machine simultaneously. Current system works as follows:

  1. All documents which needs to be ocred are inserted into a table in db.
  2. Each client ocr machine pools that table and whenever data is found for ocr, it locks table and pick n no. of files for ocr. Locking is used for atomicity.
  3. After each document is ocred, status of the document is updated as complete.

I know this is a serious mistakes to set a database as a synchronization place. It is running fine but sometimes I can see dead lock on database..

So my question is, What is the better way to design such system, I want database as a storage device only not a synchronizing place. I want to hear your thoughts.

like image 829
crypted Avatar asked Sep 29 '10 06:09

crypted


2 Answers

Well, you could have a column in the table which says whether the record is currently being processed. Within a transaction, fetch the data for a record which isn't currently being processed, and update the record to say that it's now being processed. The details of how contention will be handled there will depend on the kind of transactions you create and the database you use, but I suspect that transactions should be at the heart of it.

That's assuming you really want to use a database rather than a message queue of some description. You might consider using a message queue in conjunction with the database... and some databases have queues built into them, which could be useful too. Even if you wanted the record in the database as well, you could have a queue just of the IDs - clients could just pull the next item from the queue, then fetch the data. You may still want to record the time at which the item was pulled from the queue, so that if the client crashes or something like that, a batch job can put any failed jobs (e.g. ones which were picked up a day ago but don't have results yet) back in the queue.

like image 156
Jon Skeet Avatar answered Sep 30 '22 15:09

Jon Skeet


With using database polling for ocr files, it is better to use windows messaging service. What if the database is down and your ocr service is running, the ocr service will not get start until and unless the database service is up, with using windows messaging queue you can get the information for ocr file from messaging service (online or off-line) so that ocr service will automatic start after the machine is up and there will not any issue of deadlocking on database.

like image 34
Syntax Avatar answered Sep 30 '22 16:09

Syntax