Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gearman vs. Redis when writing PHP batch processors

When writing a batch processor in PHP (as in, it will obviously have to be cron-ed), what are the practical differences between using Gearman and simply storing data to be processed in Redis?

My observations thus far are that while Gearman is capable of pushing work in real time, because the PHP code only runs at intervals, using a regularly scheduled command with Redis seems more or less equivalent.

Moreover, it seems like using Gearman adds unnecessary complexity to the app by binding it to the Gearman library's dispatch lifecycle.

All this said, would it be right to assume that Gearman+PHP offers no benefits over Redis+PHP given that the batch processor will not be constantly running?

like image 229
Alexander Trauzzi Avatar asked Aug 16 '13 12:08

Alexander Trauzzi


2 Answers

Gearman is a distributed job server, Redis is a distributed store. So it is bit like comparing apples to oranges.

Now, it is possible to implement Gearman-like features with Redis (based on the list data type for instance), but it is a do-it-yourself approach. While the principle is simple, the devil is in the details.

The best Redis distributed queue implementations are for Ruby (Resque) and Python (Celery, RQ). There is a port of Resque for PHP:

https://github.com/chrisboulton/php-resque

There are important points to consider when comparing Gearman to a Redis-based implementation:

  • Gearman jobs notify their completion to the client, and can be synchronous or asynchronous. If you do not implement something specific, a Redis queue will only support asynchronous jobs without completion notification.

  • High-availability of the broker. Gearman proposes an off-the-shelf strategy. Redis does not. While you can configure master-slave replication, and use Redis Sentinel, Redis HA is not a simple problem.

  • Persistency. Gearman supports in-memory queues, but also some persistent backends (MySQL, Drizzle, sqlite, PostgreSQL). Redis proposes various persistency options, but none of them is as reliable as a transactional engine like MySQL or PostgreSQL.

  • Vertical scalability. While Redis is very efficient, it is a single-threaded process. Gearmand is a multi-threaded process, that can probably scale better (considering a single process).

Implementing a Redis-based distributed job system is fun and interesting, but if you need something working quickly, Gearman is your best bet.

like image 149
Didier Spezia Avatar answered Nov 16 '22 02:11

Didier Spezia


In addition to Didier's answer, Gearman also can provide coalescion functionality such that if for example a bunch of Clients all made an identical request before the worker finishes the job, it can send the response of the work back to all the clients.

From wikipedia:

Gearman performs coalescence on the work sent by a client. If two or more clients ask for work to be completed on the same body of work, either by seeing that the same blocks are being sent or by using the unique value sent by the client, it will coalesce the work so that only one worker is used. It does this specifically to avoid thundering herd problems which are common to cache hit failures

This would be much more complicated to implement in Redis.

like image 28
donatJ Avatar answered Nov 16 '22 01:11

donatJ