Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scalable, Delayed PHP Processing

I'm working on an online PHP application that has a need for delayed PHP event. Basically I need to be able to execute arbitrary PHP code x many seconds (but it could be days) after the initial hit to a URL. I need fairly precise execution of these PHP event, also I want it to be fairly scalable. I'm trying to avoid the need to schedule a cron job to run every second. I was looking into Gearman, but it doesn't seem to provide any ability to schedule events and as I understand, PHP isn't really meant to run as a daemon.

It would be ideal if I could tell some external process to poll a "event checker" url on PHP server at the exact time that the next event should be run. This poll time will need to be able to decreased or increased at will since event can be removed and added to the queue and. Any ideas on an elegant way to accomplish this? There is simply to much overhead in calling PHP externally (having to parse HTTP request or calling via CLI) to make this idea feasible for my needs.

My current plan is write a PHP daemon that will run the event and interface with it from the PHP server with gearman. The PHP daemon would be build around SplMinHeap so hopefully the performance wouldn't be to bad. This idea leaves a bad taste in my mouth and I was wondering if anyone had a better idea? Ideas changed slightly. Read Edit 2.

EDIT:

I'm creating an online game that evolves players taking turns with variable time limit. I'm using XMPP and BOSH to allow me to push messages to and from my clients, but I've got that part all done and working. Now I'm trying to add an arbitrary event that triggers after play from the client to let the client (and other ppl in the game) that he took to long. I can't use timed trigger on the client side because that would be exploitable (since the client can play by themselves). Hope that helps.

EDIT 2:

Thank you all for your feedback. While I think most of your ideas would work well on small scale, I have a feeling they wouldn't scale very well (external event manager) or lack the exactness this project requires (CRON). Also, in both of those cases they are external pieces which could fail and add complexity to an already complex system.

I personally feel that the only clean solution that meets the requirements for this project is to write a PHP daemon that handles the delayed events. I've begun writing what I think is the first PHP runloop. It handles watching the sockets and executing delayed PHP events. Hopefully when I'm closer to being done with this project I can post up the source, if any of you are interested in it. So far in testing it has shown to be promising solution (no problems with memory leaking or instability).

EDIT 3: Here is a link to the PHP event loop library called LooPHP for those who are interested.

TL;DR Requirements

  • Call (preferably natively) PHP at a delayed time (ranging from seconds to days)
  • Handle creation/updating/deletion of events arbitrarily (I'm expecting a high amount of canceled call).
  • Handle high load of events scheduled (100-1000 a second per server)
  • Calls should be within one second of it's scheduled time
  • At this point i'm not open to rewriting the code base into another language (maybe some day I will)
like image 388
Kendall Hopkins Avatar asked Jun 25 '10 02:06

Kendall Hopkins


5 Answers

Have your php script make an exec call to schedule your PHP script to run at the time you need using the command "at"

exec("at 22:56 /usr/bin/php myscript.php");

at executes commands at a specified time.

from the man page:

At allows fairly complex time specifications, extending the POSIX.2 standard. It accepts times of the form HH:MM to run a job at a spe cific time of day. (If that time is already past, the next day is assumed.) You may also specify midnight, noon, or teatime (4pm) and you can have a time-of-day suffixed with AM or PM for running in the morning or the evening. You can also say what day the job will be run, by giving a date in the form month-name day with an optional year, or giving a date of the form MMDDYY or MM/DD/YY or DD.MM.YY. The specifi cation of a date must follow the specification of the time of day. You can also give times like now + count time-units, where the time-units can be minutes, hours, days, or weeks and you can tell at to run the job today by suffixing the time with today and to run the job tomorrow by suffixing the time with tomorrow.

Further, if you need one second time resolution, have your script run at the start of the minute, then just sleep n seconds until it is time to execute.

like image 144
Zak Avatar answered Nov 02 '22 04:11

Zak


I think a PHP only solution will be hard(almost impossible) to implement. I came up with two solutions to your problem.

PHP/Redis solution

Question asked by Kendall:

  • How stable is redis:

Redis is very stable. The developer really writes some clean C code. You should check it out on github ;). Also a lot of big sites are using redis. For example github.They had a really interesting blog post how they made github fast :). Also superfeedr uses redis. There are a lot more big companies which are using redis ;). I would advise you to google for it ;).

  • How PHP-friendly is redis:

PHP is very PHP friendly. A lot of users are writing PHP libraries for redis. The protocol is really simple. You can debug it with telnet ;). Looking quickly predis for example has the blocking pop implemented.

  • how would i remove events:

I think you should use something like ZRemCommand.

Redis is an advanced key-value store. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists, sets, and ordered sets. All this data types can be manipulated with atomic operations to push/pop elements, add/remove elements, perform server side union, intersection, difference between sets, and so forth. Redis supports different kind of sorting abilities.

What I came up with(Pseudo-code....):

processor.php:

<?php
######----processer.php
######You should do something like nohup php processor.php enough times for processors to run event. 
#$key: should be unique, but should also be used by wakeup.php
while(true) {
    $event = blpop($key); #One of the available blocking threads will wakeup and process event
    process($event); #You should write process. This could take some time so this process could not be available
    zrem($key1, $event); #Remove event after processing it. Added this later!!!!!!
}

client.php:

######----client.php
######The user/browser I guess should generate these events.
#$key1: should be unique.
#$millis: when event should run
#$event: just the event to work on.

if ("add event") {
  zadd($key1, $millis, $event);
} else if ("delete event") {
  zremove($key1, $event)
}

#Get event which has to be scheduled first
$first = zrange($key1, 0, 0);

if ($oldfirst <> $first) { #got different first event => notify wakeup.php.
    lpush($key2, $first);
}

$oldfirst = $first;

wakeup.php:

####wakeup.php
#### 1 time do something like nohup php wakeup.php
#http://code.google.com/p/redis/wiki/IntroductionToRedisDataTypes => read sorted set part.
while(true) {
    $first = zrange($key1, 0, 0);
    $event = blpop($key2, $timeoutTillFirstEvent);

    if ($event == nill) {
        #Blockingqueue has timedout which means event should be run by 1 of blocking threads.
        blpop($key2, $first);
    }    
}

Something along the lines of this you could also write a pretty efficient scheduler using PHP(Okay redis is C so kickass fast :)) only and it would be pretty efficient as well :). I would also like to code this solution so stayed tuned ;). I think I could write a usable prototype in a day....

My java solution

This morning I think I created a java program which you can use for your problem.

  1. download:

    Visit github's download page to download the jar file(with all dependencies included).

  2. install:

    java -jar schedule-broadcaster-1.0-SNAPSHOT-jar-with-dependencies-1277709762.jar

  3. Run simple PHP snippets

    1. First php -f scheduler.php
    2. Next php -f receiver.php
  4. Questions

    I created these little snippets so that hopefully you will understand how to use my program. There is also a little bit documentation in the WIKI.

App Engine's TaskQueue

A quick solution would be to Use Google's app engine task queue which has a reasonable free quota. After that you have to pay for what you use.

Using this model, App Engine's Task Queue API allows you to specify tasks as HTTP Requests (both the contents of the request as its data, and the target URL of the request as its code reference). Programmatically referring to a bundled HTTP request in this fashion is sometimes called a "web hook."

Importantly, the offline nature of the Task Queue API allows you to specify web hooks ahead of time, without waiting for their actual execution. Thus, an application might create many web hooks at once and then hand them off to App Engine; the system will then process them asynchronously in the background (by 'invoking' the HTTP request). This web hook model enables efficient parallel processing - App Engine may invoke multiple tasks, or web hooks, simultaneously.

To summarize, the Task Queue API allows a developer to execute work in the background, asynchronously, by chunking that work into offline web hooks. The system will invoke those web hooks on the application's behalf, scheduling for optimal performance by possibly executing multiple webhooks in parallel. This model of granular units of work, based on the HTTP standard, allows App Engine to efficiently perform background processing in a way that works with any programming language or web application framework.

like image 22
Alfred Avatar answered Nov 02 '22 04:11

Alfred


This seems like the perfect place for an event Queue in a database.

Have your user-created events (triggered by visiting the web page) create an entry into the DB that includes the instructions for the action to take place, and the timestamp for when it should happen. You Daemon (either a persistant application or triggered by CRON) checks the DB for events that should have happened ( $TriggerTime <= time()) and that have not been flagged as "processed" yet. If you find one or more of these events, execute the instruction, and finally mark the event as "processed" in the DB or simply delete the entry.

The bonus of using the DB to store the events (and not something that is resident in the RAM of an application) is that you can recover from a crash without data loss, you can have more than one worker reading in a single event at a time, and you can modify the event's simply.

Also, there are lots of folks who use PHP as a general daemon scripting language on servers, etc. Cron can execute a PHP script (and confirm that an instance of that "app" is already running) that checks the Event Queue every-so-often. You can have a little app that dies after a minute of inactivity, and then gets restarted by CRON. The app can check the DB for entries at a fast frequency of your choosing (like 1s). Normally Cron cannot do a timing event faster than once per minute.

like image 45
Evan Avatar answered Nov 02 '22 03:11

Evan


You could use Node.JS which is an event-driven, JavaScript-based web server. Run it on a secret, internal port with a script that receives notification from the PHP script and then schedules the action to be run xx seconds later. The action in Node.JS could be as simple as running a PHP script on the main web server.

like image 44
Colin O'Dell Avatar answered Nov 02 '22 04:11

Colin O'Dell


I recommend also the queue strategy, but you seem to dislike using the database as queue. You've got a XMPP infrastructure, so leverage it: use a pubsub Node and post your events to this node. Pubsub can optionally be configured to store unfetched items in a persistent way.

Your daemon process (no matter what language) can fetch all stored items at startup time and subscribe to changes to get notified about incoming actions. This way you can solve your problem in an elegant, asynchronous way.

like image 31
towe75 Avatar answered Nov 02 '22 03:11

towe75