Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DHT: BitTorrent vs kademlia vs clones (python)

I'm in the middle of implementing my own dht for internal cluster. Since it will be used in file-sharing program like bittorrent, "Mainline DHT" was the first thing I was look at. After that I found "entangled" (python, dht using twisted matrix), congress (python, dht using pyev + libev) and of course original "kademlia".

They have different approaches on organizing k-buckets:

1) congress, kademlia use fixed 160 buckets in range 2*i <= (difference for each id from us) < 2*(i+1), for 0 <= i < 160.

2) mainline DHT and entangled use dynamic buckets. On start they have just 1 bucket covering whole space. After it will be filled with 8 alive nodes, bucket will be splitted to 2 new. But ONLY if our own id inside that bucket. If it is not -- bucket will be never splitted. So, soon we will have 160 closest to us buckets and few other.

Both variants are good enough. But I have found HUGE difference in logic which detects belongs some id to some bucket or not. And this is my question.

congress and kademlia treat bucket bundaries as "minimum distance from us" and "maximum distance from us". So, our own ID will be ALWAYS in bucket0. Maximum 2 other ids in bucket1 (because it covers 2*1 <= x < 2*2 distances) will be ALWAYS closest to us. So my brain does not breaks, coz everything OK.

But if you look into Mainline DHT or entangled, you will see what bucket bundaries treated as absolute node id bundaries, not xor distance! So in theoretically full table ids 0,1,2,3,4,5,6,7 will be in 1 bucket.

So. Why some implementations treat bucket boundaries as "max/min distance from us", while others as "max/min 160bit integer value"??

like image 308
Vadim Fint Avatar asked Oct 09 '11 00:10

Vadim Fint


1 Answers

The kademlia paper actually calls out the optimization of dynamically splitting buckets as the routing table grows. There is no logic difference between these two approaches, it's just an optimization to save some space. When implementing a fixed full sized routing table, you have to find k nodes to send requests to. If the bucket your target falls in is empty, or has fewer than k nodes in it, you have to pick from neighboring buckets. Given that, have the closest bucket to you not be split in the first place, makes that search simpler and faster.

as for your point (1), I think you may have misunderstood kademlia. The routing table bucket boundaries are always relative your own node ID. And the ID space the buckets span double for each bucket further away from you. Without this property (if, say each bucket covered an equal range of the ID space) you would not be able to do searches properly, and they would certainly not be log(n).

The mainline DHT implements kademlia.

like image 141
Arvid Avatar answered Oct 25 '22 10:10

Arvid