Disruptor pattern - how are master and slave nodes kept in sync?

Question

In the LMAX Disruptor pattern, the replicator is used to replicate the input events from a master node to slave node. So the setup would probably look like the following:

enter image description here

The replicator of master node writes the events to a DB (all though we can think better mechanisms than writing to a DB- it is not very important to the problem statement). The Receiver of the slave node reads from the DB and puts the events onto the slave node's ring buffer.

The output events of the slave node are ignored.

Now there is chance that the Business Logic Processor of Master node will be slower than the Business Logic Processor of the Slave Node. For Example BL of Master node could be at slot 102, where as Slave node could be at 106. (This can occur because the replicator reads the event from the ring buffer before the business logic processor).

In such a scenario if master node fails and slave node now becomes the master node, a few crucial events could be missed by the external systems. This could happen because Node 2 when it was acting as slave node has its output ignored.

Martin Fowler does state that the job of the replicator is to keep the nodes in sync: "Earlier on I mentioned that LMAX runs multiple copies of its system in a cluster to support rapid failover. The replicator keeps these nodes in sync"

But I am not sure how it can keep Business Logic Processor in sync? Any ideas?

Martin Thompson · Accepted Answer

Replication is directly from master to slave node and not via a database. The replication gates on acknowledgement from the slave.

http://www.infoq.com/presentations/LMAX

The link above goes into more detail and it is worth reading the comments discussion on the presentation.

jasonk · Answer

If it is low cost to have dropped events, then you can just ignore it (?).

As a simple implementation, you could have the output disruptor on the primary notify the slave that it has completed sending a packet. Think of it as a two-stage replicator - one to replicate the event, and a second replicator to confirm that the event has been sent.

In a real world implementation you may need additional downstream support in your architecture (especially replay / retry). Depending on your application requirements, you need a capability to detect that there is a gap in the output events and fetch them as necessary. Assuming your events are idempotent, there should be no issues to go back in time and replay the events.

Suppose a single packet is lost on your outbound channel or your internet line goes down? Even if it is successfully sent out from the disruptor, it can still be lost. This depends on your specific application and requires a lot more thought than can go here as to what failure scenarios you can tolerate.

Disruptor pattern - how are master and slave nodes kept in sync?

Tags:

disruptor-pattern

failover

lmax

coder_bro

2 Answers

Martin Thompson

jasonk

Recent Activity

Donate For Us

Disruptor pattern - how are master and slave nodes kept in sync?

Tags:

disruptor-pattern

failover

lmax

coder_bro

2 Answers

Martin Thompson

jasonk

Related questions

Recent Activity

Donate For Us