I'm considering moving from AppEngine to EC2/Elastic Beanstalk as I need my servers located within the EU [AppEngine doesn't offer a server location option AFAIK]. I've run the Elastic Beanstalk sample application, which is good as far as it goes; however one of the AppEngine features I rely on heavily is the offline task queues / cron facility, as I periodically fetch a lot of data from other sites. I'm wondering what I would need to setup on Elastic Beanstalk / EC2 to replicate this task queue facility, whether there are any best practices yet, how much work it would take etc.
Thanks!
A potential problem with cron services in Beanstalk is that a given scheduled command might be invoked by more than one service if the application is running on more than one instance. Coordination is needed between the running Tomcat instances to ensure that jobs are run by only one, and that if one of them dies the cron service doesn't get interrupted.
How I'm implementing it is like this:
each tomcat instance will run a cron thread which should read the configuration from the WAR and schedule itself to sleep as long as needed until the next service invocation. once the time hits, the instance should first attempt to "claim" the invocation by first grabbing the last invocation time for that command from the database, then updating it to get the "lock".
query(SELECT last_execution_time FROM crontable WHERE command = ?)
if(NOW() - last_execution_time < reasonable window) skip;
query(UPDATE crontable SET last_execution_time = NOW() WHERE command = ? AND last_execution_time = ?)
if(number of rows updated == 0) skip;
run task()
The key element here is that we also include the last_execution_time
in the WHERE clause, ensuring that if some other instance updates it between when we SELECT and UPDATE, the update will return that no rows were affected and this instance will skip executing that task.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With