Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to robustly, but minimally, distribute items across a peer-to-peer system

If one has a peer-to-peer system that can be queried, one would like to

  • reduce the total number of queries across the network (by distributing "popular" items widely and "similar" items together)
  • avoid excess storage at each node
  • assure good availability to even moderately rare items in the face of client downtime, hardware failure, and users leaving (possibly detecting rare items for archivists/historians)
  • avoid queries failing to find matches in the event of network partitions

Given these requirements:

  1. Are there any standard approaches? If not, is there any respected, but experimental, research? I'm familiar some with distribution schemes, but I haven't seen anything really address learning for robustness.
  2. Am I missing any obvious criteria?
  3. Is anybody interested in working on/solving this problem? (If so, I'm happy to open-source part of a very lame simulator I threw together this weekend, and generally offer unhelpful advice).

@cdv: I've now watched the video and it is very good, and although I don't feel it quite gets to a pluggable distribution strategy, it's definitely 90% of the way there. The questions, however, highlight useful differences with this approach that address some of my further concerns, and gives me some references to follow up on. Thus, I'm provisionally accepting your answer, although I consider the question open.

like image 391
John with waffle Avatar asked Nov 06 '22 23:11

John with waffle


1 Answers

There are multiple systems out there with various aspects of what you seek and each making different compromises, including but not limited to:

Amazon's Dynamo: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf

Kai: http://www.slideshare.net/takemaru/kai-an-open-source-implementation-of-amazons-dynamo-472179

Hadoop: http://hadoop.apache.org/core/docs/current/hdfs_design.html

Chord: http://pdos.csail.mit.edu/chord/

Beehive: http://www.cs.cornell.edu/People/egs/beehive/

and many others. After building a custom system along those lines, I let some of the building blocks out in open source form as well: http://code.google.com/p/distributerl/ (that's not a whole system, but a few libraries useful in building one)

like image 77
Justin Sheehy Avatar answered Nov 15 '22 07:11

Justin Sheehy