Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replicating AppEngine's task queue facility on EC2 / Elastic Beanstalk

I'm considering moving from AppEngine to EC2/Elastic Beanstalk as I need my servers located within the EU [AppEngine doesn't offer a server location option AFAIK]. I've run the Elastic Beanstalk sample application, which is good as far as it goes; however one of the AppEngine features I rely on heavily is the offline task queues / cron facility, as I periodically fetch a lot of data from other sites. I'm wondering what I would need to setup on Elastic Beanstalk / EC2 to replicate this task queue facility, whether there are any best practices yet, how much work it would take etc.

Thanks!

like image 693
Justin Avatar asked Nov 15 '22 01:11

Justin


1 Answers

A potential problem with cron services in Beanstalk is that a given scheduled command might be invoked by more than one service if the application is running on more than one instance. Coordination is needed between the running Tomcat instances to ensure that jobs are run by only one, and that if one of them dies the cron service doesn't get interrupted.

How I'm implementing it is like this:

  1. Package the cron job "config file" with the WAR. This file should contain frequencies and URLs (as each actual cron is simply an invocation of a specific URL, as AE does it)
  2. Use a single database table to maintain coordination. It requires at least two columns.
    1. a primary or unique key which (string) to hold the command along with its frequency. (e.g. "@daily http://your-app/some/cron/handler/url")
    2. a second column which holds the last execution time.

each tomcat instance will run a cron thread which should read the configuration from the WAR and schedule itself to sleep as long as needed until the next service invocation. once the time hits, the instance should first attempt to "claim" the invocation by first grabbing the last invocation time for that command from the database, then updating it to get the "lock".

  1. query(SELECT last_execution_time FROM crontable WHERE command = ?)
  2. if(NOW() - last_execution_time < reasonable window) skip;
  3. query(UPDATE crontable SET last_execution_time = NOW() WHERE command = ? AND last_execution_time = ?)
  4. if(number of rows updated == 0) skip;
  5. run task()

The key element here is that we also include the last_execution_time in the WHERE clause, ensuring that if some other instance updates it between when we SELECT and UPDATE, the update will return that no rows were affected and this instance will skip executing that task.

like image 118
Tyson Avatar answered Nov 30 '22 05:11

Tyson