Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Disruptor pattern - how are master and slave nodes kept in sync?

In the LMAX Disruptor pattern, the replicator is used to replicate the input events from a master node to slave node. So the setup would probably look like the following:

enter image description here

The replicator of master node writes the events to a DB (all though we can think better mechanisms than writing to a DB- it is not very important to the problem statement). The Receiver of the slave node reads from the DB and puts the events onto the slave node's ring buffer.

The output events of the slave node are ignored.

Now there is chance that the Business Logic Processor of Master node will be slower than the Business Logic Processor of the Slave Node. For Example BL of Master node could be at slot 102, where as Slave node could be at 106. (This can occur because the replicator reads the event from the ring buffer before the business logic processor).

In such a scenario if master node fails and slave node now becomes the master node, a few crucial events could be missed by the external systems. This could happen because Node 2 when it was acting as slave node has its output ignored.

Martin Fowler does state that the job of the replicator is to keep the nodes in sync: "Earlier on I mentioned that LMAX runs multiple copies of its system in a cluster to support rapid failover. The replicator keeps these nodes in sync"

But I am not sure how it can keep Business Logic Processor in sync? Any ideas?

like image 536
coder_bro Avatar asked Jan 12 '23 21:01

coder_bro


2 Answers

Replication is directly from master to slave node and not via a database. The replication gates on acknowledgement from the slave.

http://www.infoq.com/presentations/LMAX

The link above goes into more detail and it is worth reading the comments discussion on the presentation.

like image 76
Martin Thompson Avatar answered May 16 '23 06:05

Martin Thompson


If it is low cost to have dropped events, then you can just ignore it (?).

As a simple implementation, you could have the output disruptor on the primary notify the slave that it has completed sending a packet. Think of it as a two-stage replicator - one to replicate the event, and a second replicator to confirm that the event has been sent.

In a real world implementation you may need additional downstream support in your architecture (especially replay / retry). Depending on your application requirements, you need a capability to detect that there is a gap in the output events and fetch them as necessary. Assuming your events are idempotent, there should be no issues to go back in time and replay the events.

Suppose a single packet is lost on your outbound channel or your internet line goes down? Even if it is successfully sent out from the disruptor, it can still be lost. This depends on your specific application and requires a lot more thought than can go here as to what failure scenarios you can tolerate.

like image 44
jasonk Avatar answered May 16 '23 07:05

jasonk