Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to organize worker processes in Rails?

I frequently have some code that should be run either on a schedule or as a background process with some parameters. The common element is that they are run outside the dispatch process, but need access to the Rails environment (and possibly the parameters passed in).

What's a good way to organize this and why? If you like to use a particular plugin or gem, explain why you find it convenient--don't just list a plugin you use.

like image 326
Yehuda Katz Avatar asked Jul 09 '09 02:07

Yehuda Katz


3 Answers

I really don't like gems like delayed_job and background_job that persist to a database for the purpose of running asynchronous jobs. It just seems dirty to me. Transient stuff doesn't belong in a database.

I'm a huge fan of message queues for dealing with asynchronous tasks, even when you don't have the need for massive scalability. The way I see it, message queues are the ideal "lingua franca" for complex systems. With a message queue, in most cases, you have no restriction on the technologies or languages that are involved in whatever it is that you're building. The benefits of low-concurrency message queue usage is probably most noticeable in an "enterprisey" environment where integration is always a massive pain. Additionally, message queues are ideal when your asynchronous workflow involves multiple steps. RabbitMQ is my personal favorite.

For example, consider the scenario where you're building a search engine. People can submit URIs to be indexed. Obviously, you don't want to retrieve and index the page in-request. So you build around a message queue: The form submission target takes the URI, throws it in the message queue to be indexed. The next available spider process pops the URI off the queue, retrieves the page, finds all links, pushes each of them back onto the queue if they are unknown, and caches the content. Finally, a new message is pushed onto a second queue for the indexer process to deal with the cached content. Indexer process pops that message off the queue, and indexes the cached content. Oversimplified of course — search engines are a lot of work, but you get the idea.

As for the actual daemons, obviously, I'm partial to my own library (ChainGang), but it's really just a wrapper around Kernel.fork() that gives you a convenient place to deal with setup and teardown code. It's also not quite done yet. The daemon piece is far less important than the message queue, really.

Regarding the Rails environment, well, that's probably best left as an exercise for the reader, since memory usage is going to be a significant factor what with the long-running process. You don't want to load anything you don't have to. Incidentally, this is one area that DataMapper kicks ActiveRecord's butt soundly. Environment initialization is well-documented, and there's a lot fewer dependencies that come into play, making the whole kit and caboodle significantly more realistic.

The one thing I don't like about cron+rake is that rake is virtually guaranteed to print to standard output, and cron tends to be excessively chatty if your cron jobs produce output. I like to put all my cron tasks in an appropriately named directory, then make a rake task that wraps them, so that it's trivial to run them manually. It's a shame that rake does this, because I'd really prefer to have the option to take advantage of dependencies. In any case, you just point cron directly at the scripts rather than running them via cron.

I'm currently in the middle of building a web app that relies heavily on asynchronous processes, and I have to say, I'm very, very glad I decided not to use Rails.

like image 146
Bob Aman Avatar answered Oct 23 '22 09:10

Bob Aman


For me, not wanting to maintain a lot of extra infrastructure is a key priority, so I have used database-backed queues that are run outside of Rails.

In my case, I've used background_job and delayed_job. With background_job, the worker was kept running via cron, so there was no daemon management. With delayed_job, I'm using Heroku and letting them worry about that.

With delayed_job you can pass in as many arguments as your background worker needs to run.

Delayed::Job.enqueue(MyJob.new(param[:one], param[:two], param[:three])

I have not found a good solution to running stuff on a schedule, aside from using script/runner via cron (I prefer to use script/runner over a Rake task because I find it easier to test the code).

I've never had to have a regularly scheduled background process that needed access to a particular Rails request so that hasn't been too much of a problem.

I know there are other, cooler systems with more features but this has worked OK for me, and helps me avoid dealing with setting up a lot of new services to manage.

like image 5
Luke Francl Avatar answered Oct 23 '22 10:10

Luke Francl


I have a system that receives requests and then needs to call several external systems using web-services. Some of these requests take longer than a user can be expected to wait and I use an enterprise queuing system(activemq) to handle these requests.

I am using the ActiveMessaging plugin to do this. This allows me to marshall the request and place it on a queue for asynchronous processing with access to the request data, however you will need to write a polling service if you want to wait for the response.

I have seen Ryan Bates railscast on Starling and Workling and they look promising but I haven't used them.

like image 2
Steve Weet Avatar answered Oct 23 '22 09:10

Steve Weet