Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make active services highly available?

I know that with Network Load Balancing and Failover Clusteringwe can make passive services highly available. But what about active apps?

Example: One of my apps retrieves some content from a external resource in a fixed interval. I have imagined the following scenarios:

  1. Run it in a single machine. Problem: if this instance falls, the content won't be retrieved
  2. Run it in each machine of the cluster. Problem: the content will be retrieved multiple times
  3. Have it in each machine of the cluster, but run it only in one of them. Each instance will have to check some sort of common resource to decide whether it its turn to do the task or not.

When I was thinking about the solution #3 I have wondered what should be the common resource. I have thought of creating a table in the database, where we could use it to get a global lock.

Is this the best solution? How does people usually do this?

By the way it's a C# .NET WCF app running on Windows Server 2008

like image 927
Jader Dias Avatar asked Apr 16 '10 20:04

Jader Dias


2 Answers

For such problems they have invented message queues. Imagine the case when your clustered applications all listen to a message queue (clustered itself :-)). At some point in time one instance gets your initial command to download your external resource. If successful, your instance flushes the message and instead it posts another one for a later execution time that's equal to 'the run time' + 'interval'. But in case the instance dies during processing, that's not a problem. The message is rolled back in the queue (after timeout) and some other instance can pick it up. A bit of transactions, a bit of message queues

I am on the Java EE side of the world so can help you with coding details

like image 118
Petre Maierean Avatar answered Sep 20 '22 10:09

Petre Maierean


I have once implemented something similar using your solution #3.

Create a table called something like resource_lock, with a column (e.g. locking_key) that will contain a locking key.

Then at each interval, all instance of your app will:

  1. Run a query like 'update resource_lock set resource_key = 1 where resource_key is null'. (you can of course also insert a server-specific id, a timestamp, etc.)
  2. If 0 rows updated: do nothing - another app instance is already fetching the resource.
  3. If 1 row updated: fetch the resource and set locking_key back to null.

There are two advantages with this:

  • If one of your servers fails, the resource will still be fetched by the servers that are still running.
  • You leave the locking to the database, this saves you from implementing it yourself.
like image 33
Eric Eijkelenboom Avatar answered Sep 20 '22 10:09

Eric Eijkelenboom