I use PHP and have like 10 tasks that need to run. Each one of them should not timeout but all 10 tasks together might.
Is it a good solution to use a modular approach with new http requests?
Something like this:
http://example.com/some/module/fetch
http://example.com/some/module/parse
http://example.com/some/module/save
Maybe these urls do one task each. If it's successful, do the next task from that task. A kind of chain reaction. One path calls the next (with curl).
Pros and cons? Is it a good approach? If not, what is a better alternative?
The modular approach is a good idea (if one "unit" fails, the job stops as you desire; plus it's simpler to debug/test each individual unit).
It will work, but your approach to chaining has some issues:
To get more control, you should use a queuing system for this kind of operation.
Using PHP (or any language, really), your basic process is:
each "unit" is a continuously looping php script that never ends*
each "unit" process listens to a queuing system; when a job arrives on the queue that it can handle then it takes it off the queue
when each unit is finished with the job, it confirms handled and pushes to the next queue.
if the unit decides the job should not continue, confirm the job handled but don't push to the next queue.
Advantages:
Side note on "continually looping PHP": this is done by setting max_execution_time "0". Make sure that you don't cause "memory leaks" and have cleanm . You can auto-start the process on boot (systemd, or task scheduler depending on OS) or run manually for testing. If you don't want to have it continuously looping, timeout after 5 minutes and have cron/task scheduler restart.
Side note on queues: you can "roll your own" using a database of memory cache for simple applications (e.g. can easily cope with 100,000 items an hour in a queue using a database system) but avoiding conflicts / managing state/retries is a bit of an art. A better option is RabbitMQ (https://www.rabbitmq.com/). It's a bit of a niggle to install, but once you've installed it, follow the PHP tutorials and you'll never look back!
Assuming you want to use HTTP requests, you have a few options, set a timeout, each time less:
function doTaskWithEnd($uri, $end, $ctx = null) {
if (!$ctx) { $ctx = stream_context_create(); }
stream_context_set_option($ctx, "http", "timeout", $end - time());
$ret = file_get_contents($uri, false, $ctx));
if ($ret === false) {
throw new \Exception("Request failed or timed out!");
}
return $ret;
}
$end = time() + 100;
$fetched = doTaskWithEnd("http://example.com/some/module/fetch", $end);
$ctx = stream_context_create(["http" => ["method" => "POST", "content" => $fetched]]);
$parsed = doTaskWithEnd("http://example.com/some/module/parsed", $end, $ctx);
$ctx = stream_context_create(["http" => ["method" => "PUT", "content" => $parsed]]);
doTaskWithEnd("http://example.com/some/module/save", $end, $ctx);
Or alternatively, with an non-blocking solution (let's use amphp/amp + amphp/artax for this):
function doTaskWithTimeout($requestPromise, $timeout) {
$ret = yield \Amp\first($requestPromise, $timeout);
if ($ret === null) {
throw new \Exception("Timed out!");
}
return $ret;
}
\Amp\execute(function() {
$end = new \Amp\Pause(100000); /* timeout in ms */
$client = new \Amp\Artax\Client;
$fetched = yield from doTaskWithTimeout($client->request("http://example.com/some/module/fetch"));
$req = (new \Amp\Artax\Request)
->setUri("http://example.com/some/module/parsed")
->setMethod("POST")
->setBody($fetched)
;
$parsed = yield from doTaskWithTimeout($client->request($req), $end);
$req = (new \Amp\Artax\Request)
->setUri("http://example.com/some/module/save")
->setMethod("PUT")
->setBody($parsed)
;
yield from doTaskWithTimeout($client->request($req), $end);
});
Now, I ask, do you really want to offload to separate requests? Can't we just assume there are now functions fetch()
, parse($fetched)
and save($parsed)
?
In this case it's easy and we just may set up an alarm:
declare(ticks=10); // this declare() line must happen before the first include/require
pcntl_signal(\SIGALRM, function() {
throw new \Exception("Timed out!");
});
pcntl_alarm(100);
$fetched = fetch();
$parsed = parse($fetched);
save($parsed);
pcntl_alarm(0); // we're done, reset the alarm
Alternatively, the non-blocking solution works too (assuming fetch()
, parse($fetched)
and save($parsed)
properly return Promises and are designed non-blockingly):
\Amp\execute(function() {
$end = new \Amp\Pause(100000); /* timeout in ms */
$fetched = yield from doTaskWithTimeout(fetch(), $end);
$parsed = yield from doTaskWithTimeout(parse($fetched), $end);
yield from doTaskWithTimeout(save($parsed), $end);
});
If you just want to have a global timeout for different sequential tasks, I'd preferably go with doing it just all in one script with a pcntl_alarm()
, alternatively go with the stream context timeout option.
The non-blocking solutions are mainly applicable if you happen to need to do other things at the same time. E.g. if you want to do that fetch+parse+save cycle multiple times, independently from each other cycle.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With