Coordinating the execution of a single periodic task between servers in a cluster

Question

(I shall try to keep this question as short as possible while clearly describing the situation. Please comment if anything is missing.)

The situation

I'm running a cluster with three servers in the same datacenter
To ease deployment, each server runs exactly the same application code

The objective

To run a single task (call it Task X) every minute by a single server.

Under these conditions

The cluster remains distributed and highly available
Each server keeps running the same application code. In other words there is no such thing as "deploy code A to a master server and deploy code B to all secondary servers.

The reason I do not wish to distinguish between the kind of server is to maintain high availability (avoid problems when a so called master goes down), redundancy (distribute load), and to avoid creating a complex deployment procedure where I need to deploy different applications to different kinds of servers.

Why is this so difficult? If I were to add code that would execute this task every 5 minutes, then each server would execute it because each server runs the same application code. As such they need to be able to coordinate which server is going to run the same during each tick.

I am able to use distributed messaging mechanisms such as Apache Kafka or Redis. If using such mechanism to coordinate such task, how would such "algorithm" work?

I posed this question to someone else, his reply was to use a task queue. However, this does not seem to solve the issue, because the question remains: which server is going to add the task to the task queue? If all servers would add the task to the queue then it would result in duplicate entries. Moreover, which server is going to execute the next task in the queue? All this needs to be decided through coordination within the cluster, without distinguishing between different kinds of servers.

lastcanal · Accepted Answer

It sounds like you are looking for a distributed lock. Redis does this wonderfully with setnx. If you combine it with expire then you can create global locks that are released every N Seconds.

setnx will only write the value and return true if the key does not already exist. Redis operations are atomic so only the first server to call setnx after the key expires will get the go ahead to run the task.

Here is an example in ruby:

# Attempt to get the lock for 'Task X' by setting the current server's hostname
if redis.setnx("lock:task:x", `hostname`.chomp)
  # Got the lock, now I set it to expire after 5 minutes
  redis.expire("lock:task:x", 60 * 5)
  # This server has the go ahead to execute the task
  execute_task_x
else
  # Failed to get the lock. Another server is doing the work this time around
end

With this you are still reliant on calling one server Redis Master unless you take advantage of redis-sentinel. Take a look at the redis-sentinel docs for information on how to configure automatic fallover.

Coordinating the execution of a single periodic task between servers in a cluster

Tags:

distribution

cluster-computing

task-queue

high-availability

Tom

1 Answers

lastcanal

Recent Activity

Donate For Us

Coordinating the execution of a single periodic task between servers in a cluster

Tags:

distribution

cluster-computing

task-queue

high-availability

Tom

1 Answers

lastcanal

Related questions

Recent Activity

Donate For Us