Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Consequences of Inserting Positive Lookbehind into Arbitrary Regex to Simulate Byte Offset

Tags:

What would be the consequences of inserting a positive lookbehind for n-bytes, (?<=\C{n}), into the beginning of any arbitrary regular expression, particularly when used for replacement operations?

At least within PHP, the regex match functions, preg_match and preg_match_all, allow for matching to begin after a given byte offset. There is no corresponding feature in any of the other PCRE PHP functions - you can specify a limit to the number of replacements done by preg_replace for instance, but not that those replacements' matches must occur after n-bytes.

There would obviously be some (lets call them trivial) consequences to performance and readability, but would there be any (non-trivial) impacts, like matches becoming non-matches (except when they are not offset by n bytes) or replacements becoming malformed?

Some examples:

/some expression/ becomes /(?<=\C{4})some expression/ for a 4-byte offset

/(this) has (groups)/i becomes /(?<=\C{2})(this) has (groups)/i for a 2-byte offset

As far as I can tell, and from the limited tests that I've run, adding in this lookbehind effectively simulates this offset parameter and doesn't mess with any other lookbehinds, substitutions, or other control patterns; but I'm also not an expert on Regex.

I'm trying to determine if there are any likely consequences to building replace/filter function extensions by inserting the n-byte lookbehind into patterns. It should operate just as the match functions' offset parameter works - so simply running the expression against substr( $subject, $offset ) won't work for the same reasons it doesn't for preg_match (most notably it cuts off any lookbehinds and ^ then incorrectly matches the start of the substring, not the original string).