Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Kademlia structure its routing table how it does?

I understand that the Kademlia routing table is made up of 160 buckets.

Nodes are put into buckets 0-159, depending on their prefix length (which is the number of leading unset bits in the XOR of the local node key and the node).

Why is this so, is there any performance benefits involved (other than the fact that iterating through 160*20 nodes to find the closest is infeasible)?.

like image 943
liamzebedee Avatar asked Feb 21 '23 00:02

liamzebedee


2 Answers

Kademlia uses the XOR of 2 node IDs as a measure of their distance apart. The idea with the routing table buckets is that a node has detailed knowledge of the network "close" to it and less knowledge the further you get from its ID.

To find a group of nodes closest to a particular key, a node would first query the closest ones it knows about from its own routing table. On average, half of these lookups will fall into the address space covered by bucket 0. But that bucket is only allowed to contain 20 nodes, so there is little chance that these nodes are the actual closest. However, each node in that bucket will have more detailed knowledge of that part of the address space, and will likely be able to provide from its own routing table a better list of close nodes, which can then also be queried, and so on.

In this way, an iterative lookup very quickly hones in on the actual closest group of nodes.

When you say

iterating through 160*20 nodes to find the closest is infeasible

I take it you mean that actually querying each of them would be infeasible; since iterating through such a list to extract the closest ones is exactly how lookup requests (RPCs) are handled by the nodes.

Note that in real-world scenarios, it would be very unlikely for the number of buckets to get anywhere near 160. For example, for a network of a billion nodes, the average bucket count would be 27.

As to the actual values chosen in the original Kademlia paper, I don't know why the bucket size is specified as 20. However, I imagine that 160 is derived from the bit-size of a SHA1 hash. Normally a hash function is used to generate the key for a given value to be stored. If the risk of a hash-collision using SHA1 is tolerably low, then this is fine. If not, a different hash algorithm could be used, e.g. SHA256 would yield up to 256 buckets.

like image 114
Fraser Avatar answered Mar 07 '23 15:03

Fraser


Why is this so, is there any performance benefits involved

This organization combined with the XOR metric creates tiered locality guaranteeing that querying a node somewhat-closer to your target will know even-closer nodes. Which yields exponential convergence.

Maybe you can think of it as a distributed interval search.

I understand that the Kademlia routing table is made up of 160 buckets.

A flat array of (up to) 160 buckets is just a primitive way many implementations use to approximate the correct routing table layout.

With bucket splitting or multihomed routing tables you need an actual tree layout which could contain way more than 160 buckets.

In fact, here's a small fraction of a tree-based routing table of a multihomed DHT node with 89 node IDs, the complete table is larger than that (these basically are the regions for two of the 89 IDs):

0000000...   entries:8 replacements:8
0000001000...   entries:8 replacements:8
0000001001000...   entries:8 replacements:8
00000010010010...   entries:8 replacements:8
00000010010011000...   entries:8 replacements:8
000000100100110010...   entries:8 replacements:8
0000001001001100110...   entries:8 replacements:8
00000010010011001110...   entries:8 replacements:8
0000001001001100111100...   entries:5 replacements:0
0000001001001100111101...   entries:8 replacements:0
000000100100110011111...   entries:8 replacements:0
0000001001001101...   entries:8 replacements:8
000000100100111...   entries:8 replacements:8
000000100101...   entries:8 replacements:8
00000010011...   entries:8 replacements:8
000000101...   entries:8 replacements:8
00000011...   entries:8 replacements:8
0000010...   entries:8 replacements:8
0000011000...   entries:8 replacements:8
0000011001000...   entries:8 replacements:8
00000110010010...   entries:8 replacements:8
00000110010011000...   entries:8 replacements:8
000001100100110010...   entries:8 replacements:8
0000011001001100110...   entries:8 replacements:8
00000110010011001110...   entries:8 replacements:5
0000011001001100111100...   entries:6 replacements:0
0000011001001100111101...   entries:2 replacements:0
000001100100110011111...   entries:8 replacements:0
0000011001001101...   entries:8 replacements:8
000001100100111...   entries:8 replacements:8
000001100101...   entries:8 replacements:8
00000110011...   entries:8 replacements:8
000001101...   entries:8 replacements:8
00000111...   entries:8 replacements:8

Its lookup cache is even larger and consists of 7k buckets.

like image 42
the8472 Avatar answered Mar 07 '23 15:03

the8472