Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Worker pools and multi-tenant queues with RabbitMQ

I work on a web application that is a multi-tenant cloud based application (lots of clients, each with their own separate "environment", but all on shared sets of hardware) and we're introducing the ability for a user to batch up work for later processing. The types of batched work really isn't important, it's just of sufficient quantity that doing it without a work queue isn't really practical. We've selected RabbitMQ as our underlying queue framework.

Because we're a multi-tenant app, we don't necessarily want clients to be able to cause lengthy queue process times for another client, so one idea that we've floated up is creating a queue on a per client basis and having a shared worker pool pointed across ALL our client queues. The problem is that, to the best that I can figure, workers are directly bound to a specific queue, not an exchange. In our ideal world, our client queues will still be processed, without one client blocking another, from a shared worker pool that we can grow or shrink as necessary by launching more workers or closing down idle ones. Having workers tied to a specific queue prevents us from this in a practical sense, as we'd frequently have lots of workers just idling on a queue with no activity.

Is there a relatively straight forward to accomplish this? I'm fairly new to RabbitMQ and haven't really been able to accomplish what we're after. We also don't want to have to write a very complex multithreaded consumer application either, that's a time sink in dev and test time that we likely can't afford. Our stack is Windows/.Net/C# based if that's germaine, but I don't think that should have a major bearing in the question at hand.

like image 339
bakasan Avatar asked Nov 28 '11 20:11

bakasan


2 Answers

You could look at the priority queue implementation (which wasn't implemented when this question was originally asked): https://www.rabbitmq.com/priority.html

If that doesn't work for you, you could try some other hacks to achieve what you want (which should work with older versions of RabbitMQ):

You could have 100 queues bound to a topic exchange and set the routing key to a hash of the user ID % 100, i.e. each task will have a key between 1 and 100 and tasks for the same user will have the same key. Each queue is bound with a unique pattern between 1 and 100. Now you have a fleet of workers which start with a random queue number and then increment that queue number after each job, again % 100 to cycle back to queue 1 after queue 100.

Now your worker fleet can process up to 100 unique users in parallel, or all the workers can focus on a single user if there is no other work to do. If the workers need to cycle through all 100 queues between each job, in the scenario that only a single user has lot of jobs on a single queue, you're naturally going to have some overhead between each job. A smaller number of queues is one way to deal with this. You could also have each worker hold a connection to each of the queues and consume up to one un-acknowledged message from each. The worker can then cycle through the pending messages in memory much faster, provided the un-acknowledged message timeout is set sufficiently high.

Alternatively you could create two exchanges, each with a bound queue. All work goes to the first exchange and queue, which a pool of workers consume. If a unit of work takes too long the worker can cancel it and push it to the second queue. Workers only process the second queue when there's nothing on the first queue. You might also want a couple of workers with the opposite queue prioritization to make sure long running tasks are still processed when there's a never ending stream of short tasks arriving, so that a users batch will always be processed eventually. This won't truly distribute your worker fleet across all tasks, but it will stop long running tasks from one user holding up your workers from executing short running tasks for that same user or another. It also assumes you can cancel a job and re-run it later without any problems. It also means there will be wasted resources from tasks that timeout and need to be re-run as low priority. Unless you can identify fast and slow tasks in advance

The first suggestion with the 100 queues could also have a problem if there are 100 slow tasks for a single user, then another user posts a batch of tasks. Those tasks won't get looked at until one of the slow tasks is finished. If this turns out to be a legitimate problem you could potentially combine the two solutions.

like image 82
LaserJesus Avatar answered Oct 04 '22 08:10

LaserJesus


You can just have your pool of workers all consume the same unique queue. Work will then be distributed across them and you'll be able to grow/shrink your pool in order to increase/decrease your work processing capacity.

like image 40
David Dossot Avatar answered Oct 04 '22 09:10

David Dossot