I have this code I wrote for a routing component, here is the full code, which basically builds a batch of any number of large regexes (depending on a configuration value) out of many small (route-specific) ones in order to validate if a given request can or cannot be mapped by a registered route.
The general process involves first decorating each route, so that routes with placeholders like
/path/to/:variable_name
turn into a regex like like:
/path/to/(?P<R1V1>[^/]+)
And at verification time, this routes get glued together in a single regex per batch of arbitrary size.
For 'normal' usage, say, 500 routes with 1-4 placeholders, it works pretty well. But while benchmarking it, I noticed that for an extremely large number of placeholders AND and extremely large amount of routes (currently 11 placeholders and 50000 routes) the code I have fails at finding the last registered route.
I cannot figure out why. To the best of my knowledge, the thing should behave roughly in the same manner, taking (maybe? my O notation is rusty) O(n*m) times per order of magnitude (n being the amount of batches and m being how many regexes are there in a batch).
Maybe it's something in the way I'm testing this? If it's not, could you please point me to any problematic thing I'm doing?
If it is of any use, the benchmark I'm using is this exact one here
<?php
require dirname(__FILE__).'/../vendor/autoload.php';
$router = new \CFV\Router();
$dispatcher = new \CFV\Dispatcher();
$dispatcher::$ROUTES_PER_LOT = 20;
// $dispatcher::$THROW_ON_FAIL = true;
$callback = function (){};
$num_args = 11;
$routes_amount = 50000;
$matches_amount = 1;
$args = implode('/', array_map(function($i){ return ':arg' . $i; }, range(1, $num_args)));
$params = implode('/', array_map(function($i){ return '_arg' . $i; }, range(1, $num_args)));
$last_tried = '';
$load_start = microtime(true);
for ($i = 0, $str = 'a'; $i < $routes_amount; $i++, $str++) {
$router->connect("/$str/$args", $callback);
$last_tried = "/$str/$params";
}
printf("Took: %fs to load all\n", microtime(true) - $load_start);
$dispatcher->setRouter($router);
$search_start = microtime(true);
$found = $dispatcher->dispatch($last_tried);
printf("Took: %fs searching all\n", microtime(true) - $search_start);
Any pointers whatsoever would be just great.
There are some limits. From the manual:
The maximum length of a subject string is the largest positive number that an integer variable can hold. However, PCRE uses recursion to handle subpatterns and indefinite repetition. This means that the available stack space may limit the size of a subject string that can be processed by certain patterns.
Also, there are PCRE's recursion limit and backtrack limit
If there are more than 15 capturing parentheses in a pattern, PCRE has to obtain extra memory to store data during a recursion, which it does by using pcre_malloc, freeing it via pcre_free afterwards. If no memory can be obtained, it saves data for the first 15 capturing parentheses only, as there is no way to give an out-of-memory error from within a recursion.
Ref:
http://php.net/manual/en/regexp.reference.recursive.php
http://php.net/pcre.configuration#ini.pcre.recursion-limit
http://php.net/pcre.configuration#ini.pcre.backtrack-limit
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With