Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

asynchronous processing with PHP - one worker per job

Consider a PHP web application whose purpose is to accept user requests to start generic asynchronous jobs, and then create a worker process/thread to run the job. The jobs are not particularly CPU or memory intensive, but are expected to block on I/O calls fairly often. No more than one or two jobs should be started per second, but due to the long run times there may be many jobs running at once.

Therefore, it's of utmost importance that the jobs run in parallel. Also, each job must be monitored by a manager daemon responsible for killing hung workers, aborting workers on user request, etc.

What is the best way to go about implementing a system such as this? I can see:

  1. Forking a worker from the manager - this appears to be the lowest-level option, and I would have to implement a monitoring system myself. Apache is the web server, so it appears that this option would require any PHP workers to be started via FastCGI.
  2. Use some sort of job/message queue. (gearman, beanstalkd, RabbitMQ, etc.) - Initially, this seemed like the obvious choice. After some research, I'm somewhat confused with all of the options. For instance, Gearman looks like it's designed for huge distributed systems where there is a fixed pool of workers...so I don't know if it's right for what I need (one worker per job).
like image 629
Josh Avatar asked Aug 18 '10 14:08

Josh


2 Answers

Well, if you're on Linux, you can use pcntl_fork to fork children off. The "master" then watches the children. Each child completes its task and then exists normally.

Personally, in my implementations I've never needed a message queue. I simply used an array in the "master" with locks. When a child got a job, it would write a lock file with the job id number. The master would then wait until that child exited. If the lock file still exists after the child exited, then I know the task wasn't completed, and re-launch a child with the same job (after removing the lock file). Depending on your situation, you could implement the queue in a simple database table. Insert jobs in the table, and check the table in the master every 30 or 60 seconds for new jobs. Then only delete them from the table once the child is finished (and the child removed the lock file). This would have issues if you had more than one "master" running at a time, but you could implement a global "master pid file" to detect and prevent multiple instances...

And I would not suggest forking with FastCGI. It can result in some very obscure problems since the environment is meant to persist. Instead, use CGI if you must have it web interface, but ideally use a CLI app (a deamon). To interface with the master from other processes, you can either use sockets for TCP communication, or create a FIFO file for communication.

As for detecting hung workers, you could implement a "heart-beat" system, where the child issues a SIG_USR1 to the master process every so many seconds. Then if you haven't heard from the child in two or three times that time, it may be hung. But the thing is since PHP isn't multi-threaded, you can't tell if a child is hung or if it's just waiting on a blocking resource (like a database call)... As for implementing the "heart-beat", you could use a tick function to automate the heart-beat (but keep in mind, blocking calls still won't execute)...

like image 58
ircmaxell Avatar answered Oct 04 '22 23:10

ircmaxell


The workerpool might be interesting:

https://github.com/qxsch/WorkerPool

https://github.com/qxsch/WorkerPool/blob/master/examples/asyncExample.php

like image 33
JMW Avatar answered Oct 04 '22 21:10

JMW