Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set up a CRON Task to only run once per set of instances?

We are hosting our website in AWS. We currently have 3 EC2 instances in one cluster, using the AWS load balancer.

The servers have linux, apache, java, mysql, and tomcat 6.0.

We are making a decision on how to set up a task to run every hour. The obvious place to do this is in the Java code, but there is one problem.

The problem is that since we have 3 instances in the cluster (all are identical), the task will run 3 times on the hour, instead of once an hour, once per instance.

I have a few ideas to overcome this but was hoping that there is a better, possibly an industry standard, on how to manage this.

One idea is to store in the DB that it has already run. The task will see that it has already ran today or not. I see bugs there though.

The other idea was to use cron installed on one of the instances in the native OS, outside of the code in Tomcat. This would use wget to call a webpage which calls a java method. Since that would only call one of the instances, it should only run once.

Both ways seem like hacks and prone to bugs. Is there a real way to do this?

like image 406
UpHelix Avatar asked Nov 02 '11 23:11

UpHelix


People also ask

What is the use of * * * * * In cron?

* * * * * is a cron schedule expression wildcard, meaning your cron job should run every minute of every hour of every day of every month, each day of the week.

What does cron 0 * * * * * mean?

*/5 * * * * Execute a cron job every 5 minutes. 0 * * * * Execute a cron job every hour.


1 Answers

I've used the cron/wget solution and it's actually a reasonable way to solve the problem. Your system administrators will appreciate being able to control it.

Another solution is to use a JVM system property to indicate which of your instances is the one that runs the jobs. For example: -DschedulerEnabled=true. Only set that flag on one of the instances and have the job scheduling code only run if that flag is set.

Finally, Quartz supports your DB based solution with it's Clustering feature. The advantage of this is that it's a really a HA solution. With the other solutions if the machine that is acting as the job scheduler goes down you have to manually fail over to another machine.

like image 141
sourcedelica Avatar answered Oct 03 '22 12:10

sourcedelica