Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does a node join a Distributed Hash Table (DHT) cluster?

I'm trying to learn about the Distributed Hash Table (DHT) paradigm, as it fits into a P2P or fully distributed computing architecture. From a theoretical standpoint, once a cluster is established, it makes some deal of sense how it manages to swarm data and distribute work.

The most interesting part to me is that the architecture never requires some kind of centralized controller or coordinator (no single point of failure.) However, I'm still struggling to understand the practical execution of the concept, particularly how a cluster formed. If it's a fully distributed system, how does a node know how to 'join' the already established cluster?

In a simplistic example:

  • Say I'm creating a P2P application based on the DHT model
  • The application is distributed across the Internet (a.k.a. not in the same network), and any public client may connect to the cluster
  • A client connected to the cluster can see some (but not necessarily all) of the other clients in the cluster
  • A client who isn't connected doesn't have any addresses or names of clients in the cluster.

So how would a new client 'connect' if there isn't any centralized server to act as a beacon, or serve the means of introducing the new client to the cluster?

like image 358
David Elner Avatar asked Oct 23 '12 15:10

David Elner


2 Answers

This is a problem I covered as part of my dissertation, and I never found a solution I was happy with. The problem is that you need some kind of information about just one of the other peers before joining the network, getting that first address is the hard bit.

A Few ideas I came up with:

  • Encourage peers to publish their address, that way you get publicly accessible lists of known IPs building up
  • Run several "well known" bootstrap peers
  • Brute Force the address space

The last option is the only truly decentralised approach. A combination of the three is likely to be best.

Once you're bootstrapped into a network reestablishing connection after disconnecting is not hard, simply save the addresses of a couple of thousand nodes in the network who have already been long lived, at least one of them will still be online next time.

like image 62
Martin Avatar answered Oct 13 '22 20:10

Martin


From what I can think of you can create a proxy server for the network of DHT nodes and have shadow servers for that proxy server to enable reliability.

Any new node trying to join the DHT network , talks to proxy and the proxy lets it in the DHT network where it is entirely P2P.

This way only proxy server has to be public and all other DHT nodes can have their IP's private.

This might be a hinderance to you as the application is distributed across internet, but you can always talk via proxy.

like image 32
Rohit Keswani Avatar answered Oct 13 '22 20:10

Rohit Keswani