Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distributed algorithm design

I've been reading Introduction to Algorithms and started to get a few ideas and questions popping up in my head. The one that's baffled me most is how you would approach designing an algorithm to schedule items/messages in a queue that is distributed.

My thoughts have lead me to browsing Wikipedia on topics such as Sorting,Message queues,Sheduling, Distributed hashtables, to name a few.

The scenario: Say you wanted to have a system that queued messages (strings or some serialized object for example). A key feature of this system is to avoid any single point of failure. The system had to be distributed across multiple nodes within some cluster and had to consistently (or as best as possible) even the work load of each node within the cluster to avoid hotspots.

You want to avoid the use of a master/slave design for replication and scaling (no single point of failure). The system totally avoids writing to disc and maintains in memory data structures.

Since this is meant to be a queue of some sort the system should be able to use varying scheduling algorithms (FIFO,Earliest deadline,round robin etc...) to determine which message should be returned on the next request regardless of which node in the cluster the request is made to.

My initial thoughts I can imagine how this would work on a single machine but when I start thinking about how you'd distribute something like this questions like:

How would I hash each message?

How would I know which node a message was sent to?

How would I schedule each item so that I can determine which message and from which node should be returned next?

I started reading about distributed hash tables and how projects like Apache Cassandra use some sort of consistent hashing to distribute data but then I thought, since the query won't supply a key I need to know where the next item is and just supply it... This lead into reading about peer to peer protocols and how they approach the synchronization problem across nodes.

So my question is, how would you approach a problem like the one described above, or is this too far fetched and is simply a stupid idea...?

Just an overview, pointers,different approaches, pitfalls and benefits of each. The technologies/concepts/design/theory that may be appropriate. Basically anything that could be of use in understanding how something like this may work.

And if you're wondering, no I'm not intending to implement anything like this, its just popped into my head while reading (It happens, I get distracted by wild ideas when I read a good book).

UPDATE

Another interesting point that would become an issue is distributed deletes.I know systems like Cassandra have tackled this by implementing HintedHandoff,Read Repair and AntiEntropy and it seems to work work well but are there any other (viable and efficient) means of tackling this?

like image 210
zcourts Avatar asked Oct 05 '11 19:10

zcourts


1 Answers

Overview, as you wanted

There are some popular techniques for distributed algorithms, e.g. using clocks, waves or general purpose routing algorithms.

You can find these in the great distributed algorithm books Introduction to distributed algorithms by Tel and Distributed Algorithms by Lynch.

Reductions

are particularly useful since general distributed algorithms can become quite complex. You might be able to use a reduction to a simpler, more specific case.

If, for instance, you want to avoid having a single point of failure, but a symmetric distributed algorithm is too complex, you can use the standard distributed algorithm of (leader) election and afterwards use a simpler asymmetric algorithm, i.e. one which can make use of a master.

Similarly, you can use synchronizers to transform a synchronous network model to an asynchronous one.

You can use snapshots to be able to analyze offline instead of having to deal with varying online process states.

like image 162
DaveFar Avatar answered Nov 12 '22 00:11

DaveFar