Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx not working for long pattern PCRE's JIT compiler stack limit - PHP7

I am using oyejorge's less compiler.

list-style-image: url("");

traps an Exception. I narrowed it down, and I created a test script

$regex = '/\\G"((?:[^"\\\\\r\n]|\\\\.|\\\\\r\n|\\\\[\n\r\f])*)"|\'((?:[^\'\\\\\r\n]|\\\\.|\\\\\r\n|\\\\[\n\r\f])*)\'/';

$image = '"");';

$a = preg_match($regex, $image, $b);

var_dump($a);
var_dump($b);

This works an php 5.5 and 5.6, but result array is on some php7 hosts empty. Any idea why?

like image 973
Amazan23 Avatar asked Jan 18 '16 07:01

Amazan23


3 Answers

PHP 7 introduces PCRE's JIT compiler. It can affect whether/how inefficient regexes are executed on long inputs.

https://3v4l.org/Y58It

Error 6 = PREG_JIT_STACKLIMIT_ERROR

Rewrite the regex to be more efficient, typically by removing extraneous capturing groups or being more careful with quantifiers. As a workaround you can disable JIT.

https://3v4l.org/Y1pja

so you can make it work by using below solution:

ini_set('pcre.jit', false);
print_r(ini_get_all('pcre'));

$regex = '/\\G"((?:[^"\\\\\r\n]|\\\\.|\\\\\r\n|\\\\[\n\r\f])*)"|\'((?:[^\'\\\\\r\n]|\\\\.|\\\\\r\n|\\\\[\n\r\f])*)\'/';

$image = '"");';

$a = preg_match($regex, $image, $b);

//var_dump($a);
var_dump($b);
var_dump(preg_last_error());
like image 187
Chetan Ameta Avatar answered Oct 21 '22 15:10

Chetan Ameta


PCRE JIT uses a 32K machine stack by default (you can change this at PCRE compile time, but people rarely do that). This can be extended to any maximum by using the JIT stack interface. If this interface is supported by PHP, they likely provide a configuration for that. If not it is worth requesting a support for it.

Blame windows for this complexity. If everybody would use pthreads, there wouldn't be such a problem.

like image 37
dark100 Avatar answered Oct 21 '22 15:10

dark100


Your regex is terribly inefficient. Regex101.com had your original at 4702 steps, but adding a little possessiveness brought that down to 20. Proof.

$regex = '/\\G"((?:[^"\\\\\r\n]++|\\\\.|\\\\\r\n|\\\\[\n\r\f])*)"|\'((?:[^\'\\\\\r\n]++|\\\\.|\\\\\r\n|\\\\[\n\r\f])*)\'/';

TL;DR: Don't use configuration to work around bad regexen.

like image 1
Walf Avatar answered Oct 21 '22 15:10

Walf