Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to know when a set of RabbitMQ tasks are complete?

I am using RabbitMQ to have worker processes encode video files. I would like to know when all of the files are complete - that is, when all of the worker processes have finished.

The only way I can think to do this is by using a database. When a video finishes encoding:

UPDATE videos SET status = 'complete' WHERE filename = 'foo.wmv'
-- etc etc etc as each worker finishes --

And then to check whether or not all of the videos have been encoded:

SELECT count(*) FROM videos WHERE status != 'complete'

But if I'm going to do this, then I feel like I am losing the benefit of RabbitMQ as a mechanism for multiple distributed worker processes, since I still have to manually maintain a database queue.

Is there a standard mechanism for RabbitMQ dependencies? That is, a way to say "wait for these 5 tasks to finish, and once they are done, then kick off a new task?"

I don't want to have a parent process add these tasks to a queue and then "wait" for each of them to return a "completed" status. Then I have to maintain a separate process for each group of videos, at which point I've lost the advantage of decoupled worker processes as compared to a single ThreadPool concept.

Am I asking for something which is impossible? Or, are there standard widely-adopted solutions to manage the overall state of tasks in a queue that I have missed?

Edit: after searching, I found this similar question: Getting result of a long running task with RabbitMQ

Are there any particular thoughts that people have about this?

like image 647
poundifdef Avatar asked Oct 12 '11 02:10

poundifdef


People also ask

How do I know if my RabbitMQ queue is empty?

In php-amqlib: $channel->basic_get(QUEUE_NAME, true); // the second arg is no_ack . The second argument marks that no acknowledgment is expected for that message. That is, you don't have to "flag" the message as read for RabbitMQ to confidently dequeue it.

Why RabbitMQ shows activity on message rates but not on queued messages?

In case RabbitMQ receive non-routable message it drop it. So while message was received, it was not queued. You may configure Alternate Exchanges to catch such messages. Save this answer.

How do I find my RabbitMQ queue size?

The passive argument to queue_declare allows you to check whether a queue exists without modifying the server state, so you can use queue_declare with the passive option to check the queue length. This should be the accepted answer, even if he did miss Basic. Get as the 2nd source of this info.


2 Answers

Use a "response" queue. I don't know any specifics about RabbitMQ, so this is general:

  • Have your parent process send out requests and keep track of how many it sent
  • Make the parent process also wait on a specific response queue (that the children know about)
  • Whenever a child finishes something (or can't finish for some reason), send a message to the response queue
  • Whenever numSent == numResponded, you're done

Something to keep in mind is a timeout -- What happens if a child process dies? You have to do slightly more work, but basically:

  • With every sent message, include some sort of ID, and add that ID and the current time to a hash table.
  • For every response, remove that ID from the hash table
  • Periodically walk the hash table and remove anything that has timed out

This is called the Request Reply Pattern.

like image 64
Brendan Long Avatar answered Sep 20 '22 11:09

Brendan Long


enter image description here

Based on Brendan's extremely helpful answer, which should be accepted, I knocked up this quick diagram which be helpful to some.

like image 31
Ronnie Avatar answered Sep 19 '22 11:09

Ronnie