Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Self organizing applications

I have the following requirements for an application that many people will be using in the office - no Server component. All instances of client apps should somehow negotiate between themselves to decide which client will take on the server role. And the clients should communicate between themselves via IP.

If and when the client app goes down, another client must take over in a seamless manner. I understand that having a server would be much, much simpler. But because the app must be very resilient, the powers that be do not want to risk server going down (or even its backup) and rather rely on this hybrid mesh connectivity, where the server role hops from client to client.

I think I got the app connectivity down. Basically, when the app starts, it announces itself via UDP (either to a predefined IP address that everything listens to or via UDP broadcast). From there on, the communication proceeds in a similar manner.

The part I am having problems with is how to negotiate/self-organize between the clients to pick one with a server role. And how to reliably detect that a client has gone down and then a new negotiation must take place. The final difficulty is replicating data that's been accumulated by the client with a server role.

I created a prototype in c# that communicates and tries to replicate data, but the negotiation part (particularly coupled with an client failure).

Initially I thought that's what ZeroConf (aka Bonjour) did. But that just announces available network services.

Anyway, I do not want to reinvent and I can't be the first person to want to do this. So my questions:

  • Is there a pattern that already implements what I described above?
  • If so, is there an available .NET (or even native) library for this?
  • What are good ways to negotiate server role among the clients?
like image 455
AngryHacker Avatar asked Feb 06 '12 18:02

AngryHacker


3 Answers

Selection of a server amongst a group of machines, whether those machines are also clients or not, is an extremely nontrivial problem. It's called leader election. The seminal work that you should be reading is Leslie Lamport's The Part Time Parliament, which describes the Paxos protocol. Paxos has been leveraged by Google to develop a system called Chubby, which serves the purpose that you describe.

That said, you should probably look at a system like Apache ZooKeeper, which is an open-source (albeit Java) implementation of a distributed leader election, and more broadly, distributed lock management which has been thoroughly tested under massive load. The Hadoop distributed data storage and computing platform, and specifically HBase, a distributed database that runs on Hadoop, make heavy use of ZooKeeper to decide "who is in charge" amongst a group of servers. This way any of them can go down, and the others decide amongst themselves about who takes over the job.

As I mentioned earlier, leader election is fraught with error. It's very hard to get right. I've implemented paxos "for fun" a half-dozen times in C# and all of my implementations have bugs in them.

like image 68
Chris Shain Avatar answered Oct 01 '22 22:10

Chris Shain


So, you currently have a system in which each client on a LAN will announce itself via UDP to the rest of the LAN. One of the client apps is a "server" and has some additional command & control powers on top of being a client in itself.

This is not a new idea, to be sure. What you want is to add some additional talking during the initial "here I am" connection communication. When a new client yells "here I am" to the rest of the LAN, if there is a server, the server should say "Welcome, I'm the server", and the new client app now knows which client is acting as the server. All other clients should probably say "hi" as well. If there isn't a server, the new client should first repeat the "hello" (it is UDP after all; you have to expect some messages not to be received), and if nobody responds, this new client is the only one on the network and is the "server" by default. If there are others but none are claiming to be the server, the clients can "discuss amongst themselves" to determine a new server.

In addition to this, the server copy should periodically (maybe every 3-5 seconds) yell out "I'm still here" to everyone; this is known as a "heartbeat" message and is a very common alternative to the two-way "ping" method of verification.

If the server app (or any copy, really) closes normally, it should yell out "goodbye everyone, figure out who's the next server". The remaining clients can then discuss amongst themselves. If the client acting as server crashes, the clients will miss the server's "heartbeat" message, ask "who's the server", and if nobody still responds the clients will discuss amongst themselves.

Now, clients "discussing amongst themselves" can be as simple or as complex as you like. The simplest would be for whichever client says "OK, I'm server now" to become server. You would probably have to include some sort of time in the message so that if another computer says it at the same time the clients can say "well client 15 said it first so we're going with him". Clients can "vote"; have each client talk to all others to determine nominal latency between that client and all others, and that client will "vote" for the lowest-latency connection (no client may vote for itself unless it discovers it's the only one). Most votes wins.

Or, a server can, as part of its "heartbeat" message, say "if I go down, my successor is client X"; and if a heartbeat is missed and the subsequent "are you still there, server" messages from the clients are not responded to, the clients can say "the king is dead! Long live King Client X!".

Understand that by necessity, this layer of governance within an all-client system in picking an "authoritative" client to become the server is going to dramatically increase the complexity of client communications. Also, while your use of the UDP protocol lends itself to speedy communication, UDP messages collide ALL THE TIME; if you're talking while another person is talking, your messages collide. So, I would recommend the use of TCP instead of UDP for most communication in this software in which it is necessary for a particular client to be heard. That is any direct interrogation of a client ("are you still there, server?"), whatever process you use to have the clients decide who a new server is, etc etc.

like image 4
KeithS Avatar answered Sep 30 '22 22:09

KeithS


Why do you need to negotiate the role of server at all? Think about this for a second. If each "client" can handle the "server" duties for work initiated at the client, then all handle both client and server, to an extent. Then the only issue is negotiating replication of the persisted state between clients and handling concurrency when two clients are handling the same bit of state (the hardest part, from my perspective, would be alerting other clients state has been changed when one client "saves" the data and having other clients working on the state open up a method to resolve clashes -- may not be an issue if last in always wins, but that is rare).

If you are truly going to mesh, then each client should be able to handle it's on work independently of the others and only communicate changes to persisted state so the copies match.

the above assumes that the number of times more than one person uses the same state at the same time is limited. If that is the normal scenario, then you will have to figure out some of your "server" logic.

like image 3
Gregory A Beamer Avatar answered Oct 02 '22 22:10

Gregory A Beamer