Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reliably process a queue?

Take this conceptually simple task: consuming a queue, sending an email for each entry.

A simple approach would be:

while true:
   entry = queue.pop()
   sendMail();

The problem here is, if the consumer crashes after popping but before/during sending the mail, a mail is lost. So you change it to:

while true:
   entry = queue.peek()
   sendMail();
   queue.pop();

But now, if the consumer crashes after mailing, but before popping, the mail will be sent again when the consumer comes back up.

What's the best-practice way of handling this problem?

Sending email is just an example here which be substituted for any mission critical task. Also assume that the popping of the queue is the only record of the mail having been sent, so the mail subsystem does not record anything itself.

like image 209
Bart van Heukelom Avatar asked Apr 12 '16 09:04

Bart van Heukelom


2 Answers

Doesn't your requirement seem like trying to solve the two generals problem (which doesn't have a deterministic solution/limit)? https://en.wikipedia.org/wiki/Two_Generals%27_Problem

Peek - Process - Remove

You want to only remove when you've ensured successful processing, and ensure proper removal. Well any of those messages can be lost/programs can crash at any step.

Most robust messaging queues rely on a set of acks + repeated tries (deliveries) to get the desired behavior (until the acks come back).

But it's actually impossible to guarantee perfect behavior in every scenario. You just have to end up weighing odds and make an engineering compromise between repeated (atleast attempted) processing and "never" (infinite memory etc) losing a message - specific to your actual application needs. Again not a new problem :), and unlikely you'll need to write scratch code for it - like i mentioned, most MQs solve this exact problem.

like image 178
Vivek Avatar answered Oct 20 '22 14:10

Vivek


I am proposing two solutions here. The first is a proposed design (can be elaborated after further brainstorming) based on my experience and the second one is a short and quick solution. Have a look, ponder and you can choose whichever suits you.

THE LONG WAY OUT – CREATE IT FROM SCRATCH

If you are planning to create a fault-tolerant and a highly-available queue system, you would have to address the major challenge you are facing.

How to ensure that no messages are lost?

Know your producers and consumers: In order to design a solution, first we need to have knowledge of our producers and consumers. Single producer - single consumer. Single producer – multiple consumers. Multiple producers – multiple consumers. The best approach would be to create a mechanism which caters to multiple producers - multiple consumers and in addition to that, is configurable to cater to any of the three scenarios.

Next question; how do we do that? Simple answer, if we can somehow create a configurable mechanism which has the ability to take multiple messages and broadcast it to multiple consumers. That mechanism also has the ability to read the configurations, validate messages (optional, you can add it in consumers too), store messages for a small duration, track acknowledgements, decompose one message to many, aggregate many messages to one, have an ‘action plan’ when dealing with timeouts or failures and implement that ‘action – plan’.

Elaborating the mechanism: Let us call this mechanism, a Broker. So in your solution the broker will be placed in the following manner. The solid arrows are messages, and the dotted ones are acknowledgements.
Broker Block Diagram

I am avoiding going into the detailed design of a broker here, as it will be out of context.

Handling failures: Identify the possible point of failures 1. Producers 2. Consumers 3. Broker 4. Network

Producer Failures: - If there is a replication, and alternate producers keep sending messages without impacting the functionality, the throughput may be affected, until the original producer is up and running again.

For Consumer Failures and Network Failures, the broker can maintain a mechanism which will keep the messages until an acknowledgement (let us call it ack, for the sake of brevity) is received. Once the ack is received, the message corresponding to the ack is removed.

The consumer has to deal with this scenario a little differently. Let us say, that Consumer keeps the following variables in the state a. Last received message b. Consumer state = (Active, Dormant, Re-Started).

The moment consumer is started, its value can be (RE-STARTED).The last received message of the consumer is updated with every message received from the broker, and the state is changed to ACTIVE. If consumer tries to send an ack to the broker and the connection times out, or there are issues with network, the state CHANGES to DORMANT, and it is preserved. For the two scenarios of RE-STARTED and DORMANT, a validation is performed whether the processing of the Last received message is done. If yes, it sends the ack again to the broker and waits for the next message. The moment, the next message is received, the state can be changed to ACTIVE and the processing can begin as normal.

The broker on the other hand just keeps the last sent message until the ack is received. To overcome the failures of the broker, a master-slave configuration can be prepared wherein, the state of the broker is replicated and the messages are re-directed to the other broker, in case the first one becomes unavailable.

THE SHORT SOLUTION:

Use a JMS like @Marcin proposed. I have personally worked upon RabbitMQ(http://previous.rabbitmq.com/v3_4_x/features.html) and feel that for most of the distributed computing scenarios, this will just work. You can configure high-availability (http://previous.rabbitmq.com/v3_4_x/ha.html) and it also comes with a nice user interface where you can monitor your queues and messages.

You are, however, encouraged to check out a JMS system that suits your needs.

Hope this helps

like image 21
Amal Gupta Avatar answered Oct 20 '22 12:10

Amal Gupta