Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fixed-length regex lookbehind complains of variable-length lookbehind

Tags:

regex

php

Here is the code I am trying to run:

$str = 'a,b,c,d';
return preg_split('/(?<![^\\\\][\\\\]),/', $str);

As you can see, the regexp being used here is:

/(?<![^\\][\\]),/

Which is a simple fixed-length negative lookbehind for "preceded by something that isn't a backslash, then something that is!".

This regex works just fine on http://www.phpliveregex.com

But when I go and actually attempt to run the above code, I am spat back the error:

Warning:  preg_split() [function.preg-split]: Compilation failed: lookbehind assertion is not fixed length at offset 13

To make matters worse, a fellow programmer tested the code on his 5.4.24 PHP server, and it worked fine.

This leads me to believe that my issues are related to the configuration of my server, which I have very little control over. I am told that my PHP version if 5.2.*

Are there any workarounds/alternatives to preg_replace() that might not have this issue?

like image 724
Georges Oates Larsen Avatar asked Sep 29 '22 21:09

Georges Oates Larsen


1 Answers

The problem is caused by the bug fixed in PCRE 6.7. Quoting the changelog:

A negated single-character class was not being recognized as fixed-length in lookbehind assertions such as (?<=[^f]), leading to an incorrect compile error "lookbehind assertion is not fixed length"

PCRE 6.7 was introduced in PHP 5.2.0, in Nov 2006. As you still have this bug, it means it's not still there at your server - so for a preg-split based workaround you have to use a pattern without a negative character class. For example:

$patt = '/(?<!(?<!\\\\)\\\\),/';
// or...
$patt = '/(?<![\x00-\x5b\x5d-\xFF]\x5c),/';

However, I find the whole approach a bit weird: what if , symbol is preceded by exactly three backslashes? Or five? Or any odd number of them? The comma in this case should be considered 'escaped', but obviously you cannot create a lookbehind expression of variable length to cover these cases.

On the second thought, one can use preg_match_all instead, with a common alternation trick to cover the escaped symbols:

$str = 'e ,a\\,b\\\\,c\\\\\\,d\\\\';
preg_match_all('/(?:[^\\\\,]|\\\\(?:.|$))+/', $str, $matches);
var_dump($matches[0]);

Demo.

I really think I covered all the issues here, those trailing slashes were a killer )

like image 110
raina77ow Avatar answered Oct 03 '22 00:10

raina77ow