What would be the consequences of inserting a positive lookbehind for n-bytes, (?<=\C{n})
, into the beginning of any arbitrary regular expression, particularly when used for replacement operations?
At least within PHP, the regex match functions, preg_match
and preg_match_all
, allow for matching to begin after a given byte offset. There is no corresponding feature in any of the other PCRE PHP functions - you can specify a limit to the number of replacements done by preg_replace
for instance, but not that those replacements' matches must occur after n-bytes.
There would obviously be some (lets call them trivial) consequences to performance and readability, but would there be any (non-trivial) impacts, like matches becoming non-matches (except when they are not offset by n bytes) or replacements becoming malformed?
Some examples:
/some expression/
becomes /(?<=\C{4})some expression/
for a 4-byte offset
/(this) has (groups)/i
becomes /(?<=\C{2})(this) has (groups)/i
for a 2-byte offset
As far as I can tell, and from the limited tests that I've run, adding in this lookbehind effectively simulates this offset parameter and doesn't mess with any other lookbehinds, substitutions, or other control patterns; but I'm also not an expert on Regex.
I'm trying to determine if there are any likely consequences to building replace/filter function extensions by inserting the n-byte lookbehind into patterns. It should operate just as the match functions' offset parameter works - so simply running the expression against substr( $subject, $offset )
won't work for the same reasons it doesn't for preg_match
(most notably it cuts off any lookbehinds and ^
then incorrectly matches the start of the substring, not the original string).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With