Im currently working on a Distributed System where we have to implement some kind of Leader Election. The problem is that we would like to avoid that all computers have to know each other - but only the leader. Is there a fast way where we can use for instance Broadcast to achieve what we want?
Or does we simply have to know at least one, to perform a good Leader Election?
It is assumable that all computers is on same subnet.
Thanks for your help.
Definition. The problem of leader election is for each processor eventually to decide whether it is a leader or not, subject to the constraint that exactly one processor decides that it is the leader.
A leader election algorithm describes how a cluster of nodes without a leader can communicate with each other to choose exactly one of themselves to become the leader. The algorithm is executed whenever the cluster starts or when the leader node goes down.
The goal of the Leader Election (LE) algorithm is for a group of processes to select one of the pro- cesses to serve as its leader. LE is a classic distributed system problem that arises in many different venues.
Election algorithm basically determines where a new copy of coordinator should be restarted. Election algorithm assumes that every active process in the system has a unique priority number. The process with highest priority will be chosen as a new coordinator.
The problem is that we would like to avoid that all computers have to know each other - but only the leader.
Leader election is the problem of picking a single leader out of a set of potential leader candidates. Look at it as having two required properties: liveness and safety. Here, liveness would mean "most of the time, there is a leader", while safety would mean "there are either zero or one leaders". Let's consider how we would solve this safety property in your example, using broadcast.
Let's pick a simple (broken) algorithm, assuming every node has a unique ID. Each node broadcasts its ID and listens. When receiving a higher ID than its own, it stops participating. If it receives a lower ID than its own, it sends broadcasts its own again. Assuming a synchronous network, the last ID everybody receives is the leader's ID. Now, introduce a network partition. The protocol will happily continue on either side of the partition, and two leaders will be elected.
That's true of this broken protocol, but it's also true of all possible protocols. How do you tell the difference between nodes you can't communicate with and nodes that don't exist if you don't know (at least) how many nodes exist? So there's a first safety result: you need to know how many nodes exist, or you can't ensure there is only one leader.
Now, let's relax our safety constraint to be a probabilistic one: "there can be zero or more leaders, but most of the time there is one". That makes the problem tractable, and a widely-used solution is gossip (epidemic protocols). For example, see A Gossip-Style Failure Detection Service which discusses a variant of this exact problem. The paper mainly concerns itself with probabilistically correct failure detection and enumeration, but if you can do that you can do probabilistically correct leader election too.
As far as I can tell, you can't have safe non-probabilistic leader election in general networks without at least enumerating the participants.
As one of interesting 'distributed mechanics' solutions I have see last time I'd recommend Apache zookeeper project. This is open source solution so at least you should be able to get couple of ideas from there. Also it is intensively developing so probably you can reuse it just as part of your solution.
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.
I would recommend JGroups to solve this problem - assuming you are building a system on top of the JVM.
http://www.jgroups.org/
Use the LockService to ensure that only 1 node in the cluster is the leader. JGroups can be set up to use a Peer Lock or a Central Lock - either should work in your case.
See http://withmeta.blogspot.com/2014/01/leader-election-problem-in-elastic.html for a Clojure implementation, or http://javabender.blogspot.com.au/2012/01/jgroups-lockservice-example.html for a Java one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With