Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Synchronize one queue instance with multiple Redis instances

The Scenario: We have multiple nodes distributed geographically on which we want to have queues collecting messages for that location. And then we want to send this collected data from every queue in every node to their corresponding queues in a central location. In the central node, we will pull out data collected in the queues (from other nodes), process it and store it persistently.

Constraints:

  • Data is very important to us. Therefore, we have to make sure that we are not loosing data in any case.
  • Therefore, we need persistent queues on every node so that even if the node goes down for some random reason, when we bring it up we have the collected data safe with us and we can send it to the central node where it can be processed.
  • Similarly, if the central node goes down, the data must remain at all the other nodes so that when the central node comes up we can send all the data to the central node for processing.
  • Also, the data on the central node must not get duplicated or stored again. That is data collected on one of the nodes should be stored on the central nodes only once.
  • The data that we are collecting is very important to us and the order of data delivery to the central node is not an issue.

Our Solution We have considered a couple of solutions out of which I am going to list down the one that we thought would be the best. A possible solution (in our opinion) is to use Redis to maintain queues everywhere because Redis provides persistent storage. Then perhaps have a daemon running on all the geographically separated nodes which reads the data from the queue and sends it to the central node. The central node on receiving the data sends an ACK to the node it received the data from (because data is very important to us) and then on receiving the ACK, the node deletes the data from the queue. Of course, there will be timeout period in which the ACK must be received.

The Problem The above stated solution (according to us) will work fine but the issue is that we don't want to implement the whole synchronization protocol by ourselves for the simple reason that we might be wrong here. We were unable to find this particular way of synchronization in Redis. So we are open to other AMQP based queues like RabbitMQ, ZeroMQ, etc. Again we were not able to figure out if we can do this with these solutions.

  • Do these Message Queues or any other data store provide features that can be the solution to our problem? If yes, then how?
  • If not, then is our solution good enough?
  • Can anyone suggest a better solution?
  • Can there be a better way to do this?
  • What would be the best way to make it fail safe?
  • The data that we are collecting is very important to us and the order of data delivery to the central node is not an issue.
like image 861
vaidik Avatar asked Oct 08 '22 09:10

vaidik


1 Answers

You could do this with RabbitMQ by setting up the central node (or cluster of nodes) to be a consumer of messages from the other nodes, and using the message acknowledgement feature. This feature means that the central node(s) can ack delivery, so that other nodes only delete messages after the ack. See for example: http://www.rabbitmq.com/tutorials/tutorial-two-python.html

If you have further questions please email the mailing list rabbitmq-discuss.

like image 190
alexis Avatar answered Oct 13 '22 12:10

alexis