Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get Number of Nodes in a Distributed system

I'm building a distributed system with unstructured peer to peer overlay. There may be thousand of nodes in that system. Nodes can join the system dynamically(like torrent clients). In the system, each node wants to estimate number of nodes (guess an approximate value) in the system.

I'm currently using a centralized server for counting number of nodes. Each node communicates with this server. This is very inefficient and violates the distributed behaviour.

Is there a way to do this in distributed way without using a centralized server?

like image 289
Sampath Liyanage Avatar asked Mar 25 '15 06:03

Sampath Liyanage


Video Answer


1 Answers

If you only need an estimated count of the number of nodes in the system, and you want to distribute that count across the cluster rather than storing it in a database, gossip protocols are a light weight and efficient method for sharing this type of state between servers.

http://en.m.wikipedia.org/wiki/Gossip_protocol

A simple gossip implementation is: periodically each server selects some random set of neighbors with which to communicate. The server simply sends those neighbors its current state (in this case the count of nodes in the cluster). The property that makes gossip protocols robust is the information spreads like a virus.

You can expand upon this approach and use some sort of logical clock like a Lamport clock or vector clock to handle conflict resolution by essentially versioning the updates. For instance, if node A receives a server count from node B whose version is 10, and later receives a count from node C whose version is 8, node A simply ignores the update from node C since its state was last updated at an earlier logical time than was node B's. This improves the consistency of your server count by preventing servers with an "out-of-date" view of the cluster from overwriting updates from more "up-to-date" servers.

Additionally, you can even use the gossip protocol to perform more robust failure detection. For instance, in the event of a network partition from the view of some portion of the cluster it could appear that a server has died or perhaps just left the cluster voluntarily. Rather than relying on the unreliable network, you can use the gossip protocol to probe the server from multiple points by gossiping information regarding which servers have attempted to contact the suspect server. Then only when a threshold of failures has been reached is the server considered dead.

like image 100
kuujo Avatar answered Oct 29 '22 17:10

kuujo