Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resque or Gearman - choosing the right tool for background jobs [closed]

We are developing a web application wherein with about 50% of write requests, we end up pushing data to multiple data stores and inserting and updating significant number of records in those data stores. To improve the response time, we want to process such requests asynchronously in the background.

Our web application is being written in Ruby on Rails.

Two solutions that I'm inclined towards are Resque and Gearman.

Resque: More info here: http://github.com/blog/542-introducing-resque Resque seems very well suited for Ruby, and it's specifically meant for background job processing. "Background jobs can be any Ruby class or module that responds to perform. Your existing classes can easily be converted to background jobs or you can create new classes specifically to do work."

Gearman: It's not specifically meant only for background jobs for asynchronous processing, but that's something it can definitely do. Apparently more robust, or so it seems. Another advantage of Gearman is that while your client code might be in Ruby, the worker code could be in, let's say, PHP. Though right now we are completely Ruby on Rails app, who knows if in future we might want to use PHP or something else depending on the job at hand.

What would you recommend? Do you have experience with either of the two? What real life production challenges should I keep in mind while choosing between the two? And am I even comparing apple to apple here?

like image 820
Nishith Avatar asked Mar 06 '10 10:03

Nishith


1 Answers

I have some experience with Gearman when i was looking for a distributed forking mechanism that could offer a workload distribution for async. processing in a clustered environment.

I can tell you that is worked in a "simulated" case where asynchronous processing was dispatched to 2 machines (2 workers on each machine = 4 workers). Not in a real case scenario (whatever that tells you). The real case scenario will be implemented when the "simulations" provide useful information.

The mechanism you are going to choose is only one Factor in the distribution of the workload, so be sure you will not end up with corrupted or invalid data when the distributed "Workers" that work in parallel start writing on the datastores.

I would suggest taking the "simulation" approach i did and do your tests before deciding which one to use.

regards,

like image 172
Andreas Avatar answered Nov 15 '22 17:11

Andreas