Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP prevent timeout by using modules that uses http requests

I use PHP and have like 10 tasks that need to run. Each one of them should not timeout but all 10 tasks together might.

Is it a good solution to use a modular approach with new http requests?

Something like this:

http://example.com/some/module/fetch
http://example.com/some/module/parse
http://example.com/some/module/save

Maybe these urls do one task each. If it's successful, do the next task from that task. A kind of chain reaction. One path calls the next (with curl).

Pros and cons? Is it a good approach? If not, what is a better alternative?

like image 449
Jens Törnell Avatar asked Aug 24 '16 08:08

Jens Törnell


2 Answers

The modular approach is a good idea (if one "unit" fails, the job stops as you desire; plus it's simpler to debug/test each individual unit).

It will work, but your approach to chaining has some issues:

  • if there is a bottleneck (i.e. one "unit" takes longer than the others) then you may end up with 100 of the bottleneck processes all running and you lose control of server resources
  • there is a lack of control; let's say the server needs to be rebooted: to restart the jobs then you need to start them all at the beginning.
  • similarly if there is a reason you need to stop/start/debug an individual unit when running, you'll need to restart the job at the first unit to repeat.
  • by making a web request, you are using Apache/NGIX resources, memory, socket connections etc just to run a PHP process. You could just run the PHP process directly without using the overheads.
  • and finally, if on a DMZ'd web server, the server might not actually be able to make requests to itself.

To get more control, you should use a queuing system for this kind of operation.

Using PHP (or any language, really), your basic process is:

  1. each "unit" is a continuously looping php script that never ends*

  2. each "unit" process listens to a queuing system; when a job arrives on the queue that it can handle then it takes it off the queue

  3. when each unit is finished with the job, it confirms handled and pushes to the next queue.

  4. if the unit decides the job should not continue, confirm the job handled but don't push to the next queue.

Advantages:

  • if a "unit" stops, then the job remains on the queue and can be collected when you restart the "unit". Makes it easier restarting the units/server or if one unit crashes.
  • if one "unit" is very heavy, you can just start a second process doing exactly the same if you have space server capacity. If no server capacity, you accept the bottleneck; you therefore have a very transparent view of how much resource you are using.
  • if you decide that another language will handle the request better, you can mix NodeJS, Python, Ruby and... they can all talk to the same queues.

Side note on "continually looping PHP": this is done by setting max_execution_time "0". Make sure that you don't cause "memory leaks" and have cleanm . You can auto-start the process on boot (systemd, or task scheduler depending on OS) or run manually for testing. If you don't want to have it continuously looping, timeout after 5 minutes and have cron/task scheduler restart.

Side note on queues: you can "roll your own" using a database of memory cache for simple applications (e.g. can easily cope with 100,000 items an hour in a queue using a database system) but avoiding conflicts / managing state/retries is a bit of an art. A better option is RabbitMQ (https://www.rabbitmq.com/). It's a bit of a niggle to install, but once you've installed it, follow the PHP tutorials and you'll never look back!

like image 115
Robbie Avatar answered Sep 18 '22 08:09

Robbie


Assuming you want to use HTTP requests, you have a few options, set a timeout, each time less:

function doTaskWithEnd($uri, $end, $ctx = null) {
    if (!$ctx) { $ctx = stream_context_create(); }
    stream_context_set_option($ctx, "http", "timeout", $end - time());
    $ret = file_get_contents($uri, false, $ctx));
    if ($ret === false) {
        throw new \Exception("Request failed or timed out!");
    }
    return $ret;
}

$end = time() + 100;
$fetched = doTaskWithEnd("http://example.com/some/module/fetch", $end);
$ctx = stream_context_create(["http" => ["method" => "POST", "content" => $fetched]]);
$parsed = doTaskWithEnd("http://example.com/some/module/parsed", $end, $ctx);
$ctx = stream_context_create(["http" => ["method" => "PUT", "content" => $parsed]]);
doTaskWithEnd("http://example.com/some/module/save", $end, $ctx);

Or alternatively, with an non-blocking solution (let's use amphp/amp + amphp/artax for this):

function doTaskWithTimeout($requestPromise, $timeout) {
    $ret = yield \Amp\first($requestPromise, $timeout);
    if ($ret === null) {
        throw new \Exception("Timed out!");
    }
    return $ret;
}

\Amp\execute(function() {
    $end = new \Amp\Pause(100000); /* timeout in ms */

    $client = new \Amp\Artax\Client;
    $fetched = yield from doTaskWithTimeout($client->request("http://example.com/some/module/fetch"));
    $req = (new \Amp\Artax\Request)
        ->setUri("http://example.com/some/module/parsed")
        ->setMethod("POST")
        ->setBody($fetched)
    ;
    $parsed = yield from doTaskWithTimeout($client->request($req), $end);
    $req = (new \Amp\Artax\Request)
        ->setUri("http://example.com/some/module/save")
        ->setMethod("PUT")
        ->setBody($parsed)
    ;
    yield from doTaskWithTimeout($client->request($req), $end);
});

Now, I ask, do you really want to offload to separate requests? Can't we just assume there are now functions fetch(), parse($fetched) and save($parsed)?

In this case it's easy and we just may set up an alarm:

declare(ticks=10); // this declare() line must happen before the first include/require
pcntl_signal(\SIGALRM, function() {
    throw new \Exception("Timed out!");
});
pcntl_alarm(100);

$fetched = fetch();
$parsed = parse($fetched);
save($parsed);

pcntl_alarm(0); // we're done, reset the alarm

Alternatively, the non-blocking solution works too (assuming fetch(), parse($fetched) and save($parsed) properly return Promises and are designed non-blockingly):

\Amp\execute(function() {
    $end = new \Amp\Pause(100000); /* timeout in ms */
    $fetched = yield from doTaskWithTimeout(fetch(), $end);
    $parsed = yield from doTaskWithTimeout(parse($fetched), $end);
    yield from doTaskWithTimeout(save($parsed), $end);
});

If you just want to have a global timeout for different sequential tasks, I'd preferably go with doing it just all in one script with a pcntl_alarm(), alternatively go with the stream context timeout option.

The non-blocking solutions are mainly applicable if you happen to need to do other things at the same time. E.g. if you want to do that fetch+parse+save cycle multiple times, independently from each other cycle.

like image 29
bwoebi Avatar answered Sep 20 '22 08:09

bwoebi