Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would you implement a working queue in etcd

I have just started to look into etcd and one of the use cases that is mentioned in the talks given by the creators, is a work queue system.

But how would you actually implement this?

Basic pattern would be something like this.

1 process generating "work description tickets", and place that ticket in a folder of etcd lets say "/queue/worktickets/00000000001/"

1->many processes listening to the "/queue/worktickets/" folder for changes. when a new work ticket appears every process wil try to grab the ticket by creating a new value in "/queue/locks/00000001" to lock that ticket. Only the first one will be able to create the lock value.

The process that created the lock ticket does it's work, and then removes the ticket from the queue, and maybe the lock value. Then try to grab the next available ticket from the queue. If no more tickets available, start listening to changes in the "/queue/worktickets/" folder again.

In my head this should be fairly simple to implement, but if the queue gets large(ticket is easy to generate but hard to process) then it seems there will be a lot of data being transeferd from etcd to each of the clients. To my knowledege there is no way of saying give me the first value in this folder that does not exist in this folder nor is there some give me top n items from folder.

Anyone care to share their thougths on this.

like image 897
PEtter Avatar asked Sep 18 '25 18:09

PEtter


1 Answers

So I came up with a solution that I think is robust. Here are the goals that I designed my solution for:

  • It should be efficient (i.e. O(1) network roundtrips) for a worker to retrieve an item from the work queue.
  • If a worker dies or otherwise fails while it's processing an item, the item becomes available to other workers.

So the idea is to have two queues: a pending queue and a running queue. Originally all items are in the pending queue.

When a worker tries to retrieve an item, it transfers it from the pending queue to the running queue using a transaction (available in etcd 3). In the same transaction, the worker also creates a lock for the item. The lock is guarded by a lease, so it's automatically removed if the worker dies.

If a worker successfully finishes processing the item, it removes the item from the running queue and we are done. If the worker fails, the lock will expire, and the item is left in the running queue.

Thus, workers are also expected to look into the running queue once the pending queue has been exhausted. The expectation is that the running queue will be small comparing to the pending queue, thus finding an item that's currently not locked (by simply listing the running queue) is not going to be expensive.

Or as @dannysauer mentioned in the comments, you could also have another fault-tolerant process that transfers items from the running queue back to the pending queue.

like image 96
Derek Chiang Avatar answered Sep 23 '25 07:09

Derek Chiang