Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scheduling MapReduce jobs for MongoDB

This is more of an implementation question, but are there any shortcomings to using something simple like cron to schedule tasks like mapreduce for MongoDB? Say something needs to be executed every hour, it seems like a suitable way to do this... But I guess I'm just asking because of all the popular job queuing systems out there like Resque and others.

I suppose my question is more like, does cron provide solid and reliable enough solution? Thoughts?

like image 720
JP Silvashy Avatar asked Jun 08 '11 20:06

JP Silvashy


1 Answers

Cron has been used for decades and is quite reliable and solid; if your cron isn't reliable then I'd suggest that a stern discussion with your OS vendor is in order. Also, the MongoDB documentation talks about cron jobs (google "site:mongodb.org cron" for examples) so, presumably, cron jobs are to be expected with MongoDB.

That said, if you already have a bunch of infrastructure set up for another scheduling system then there's probably no reason to use cron for MongoDB and something else for other tasks.

In any case, you'll probably want to layer on a simple PID file locking system if your cron jobs might take long enough to overlap and you only want one running at a time:

  • The cron job looks for a PID file when it starts.
  • If it finds the file, then it reads the old job's PID out of the file and checks if it is still running.
    • If the old one is running then the new one would complain and exit.
    • If the old one isn't running, then the new one would continue on.
  • When the new job decides that it is okay to start, it writes its PID to the PID file.
  • When the new job is finished, it deletes the PID file immediately before exiting (or using an atexit handler or whatever similar feature your environment supports).
like image 87
mu is too short Avatar answered Sep 23 '22 08:09

mu is too short