Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kademlia XOR metric properties purposes

Tags:

In the Kademlia paper by Petar Maymounkov and David Mazières, it is said that the XOR distance is a valid non-Euclidian metric with limited explanations as to why each of the properties of a valid metric are necessary or interesting, namely:

  • d(x,x) = 0
  • d(x,y) > 0, if x != y
  • forall x,y : d(x,y) = d(y,x) -- symmetry
  • d(x,z) <= d(x,y) + d(y,z) -- triangle inequality

Why is it important for a metric to have these properties in general? Why is each of these properties necessary in the context of routing queries in the Kademlia Distributed Hash Table implementation?

In addition, the paper mentions that unidirectionality (for a given x, and a distance l, there exist only a single y for which d(x,y) = l) guarantees that all queries will converge along the same path. Why is that so?

like image 232
Erick Avatar asked Sep 09 '14 19:09

Erick


1 Answers

I can only speak for Kademlia, maybe someone else can provide a more general answer. In the meantime...

  • d(x,x) = 0
  • d(x,y) > 0, if x != y

These two points together effectively mean that the closest point to x is x itself; every other point is further away. (This may seem intuitive, but other aspects of the XOR metric aren't.)

In the context of Kademlia, this is important since a lookup for node with ID x will yield that node as the closest. It would be awkward if that were not the case, since a search converging towards x might not find node x.

  • forall x,y : d(x,y) = d(y,x)

The structure of the Kademlia routing table is such that nodes maintain detailed knowledge of the address space closest to them, and exponentially decreasing knowledge of more distant address space. In short, a node tries to keep all the k closest contacts it hears about.

The symmetry is useful since it means that each of these closest contacts will be maintaining detailed knowledge of a similar part of the address space, rather than a remote part.

If we didn't have this property, it might be helpful to think of the search as more like the hands of a clock moving in one direction round a clockface. The node at 1 o'clock (Node1) is close to Node2 at 2 o'clock (30°), but Node2 is far from Node1 (330°). So imagine we're looking for the two closest to 3 o'clock (i.e. Node1 and Node2). If the search reaches Node2, it won't know about Node1 since it's far away. The whole lookup and topology would have to change.

  • d(x,z) <= d(x,y) + d(y,z)

If this weren't the case, it would be impossible for a node to know which contacts from its routing table to return during a lookup. It would know the k closest to the target, but there would be no guarantee that one of the other more distant contacts wouldn't yield a shorter overall path.

Because of this property and unidirectionality, different searches starting from vastly separated points will tend to converge down the same path.

The unidirectionality means that no two nodes can have the same distance from a given point. If that weren't the case, then the target point could be encircled by a bunch of nodes all the same distance from it. Then various different searches would be free pick any of those to pass through. However, unidirectionality guarantees that exactly one of this bunch will be the closest, and any search which chooses between this group will always select the same one.

like image 161
Fraser Avatar answered Oct 18 '22 01:10

Fraser