We allow some user-supplied REs for the purpose of filtering email. Early on we ran into some performance issues with REs that contained, for example, .*
, when matching against arbitrarily-large emails. We found a simple solution was to s/\*/{0,1024}/
on the user-supplied RE. However, this is not a perfect solution, as it will break with the following pattern:
/[*]/
And rather than coming up with some convoluted recipe to account for every possible mutation of user-supplied RE input, I'd like to just limit perl's interpretation of the *
and +
characters to have a maximum length of 1024 characters.
Is there any way to do this?
This does not really answer your question, but you should be aware of other issues with user-supplied regular expressions, see for example this summary at OWASP. Depending on your exact situation, it might be better to write or find a custom simple pattern matching library?
Update
Added a (?<!\\)
before the quantifiers, because escaped *+ should not be matched. Replacement will still fail if there is an \\*
(match \
0 or more times).
An improvement would be this
s/(?<!\\)\*(?!(?<!\\)[^[]*?(?<!\\)\])/{0,1024}/
s/(?<!\\)\+(?!(?<!\\)[^[]*?(?<!\\)\])/{1,1024}/
See it here on Regexr
That means match [*+]
but only if there is no closing ]
ahead and no [
till then. And there is no \
(the (?<!\\)
part) allowed before the square brackets.
(?! ... )
is a negative lookahead
(?<! ... )
is a negative lookbehind
See perlretut for details
Update 2 include possessive quantifiers
s/(?<!(?<!\\)[\\+*?])\+(?!(?<!\\)[^[]*?(?<!\\)\])/{1,1024}/ # for +
s/(?<!\\)\*(?!(?<!\\)[^[]*?(?<!\\)\])/{0,1024}/ # for *
See it here on Regexr
Seems to be working, but its getting real complicated now!
Get a tree using Regexp::Parser and modify regex as you want, or provide GUI interface to Regexp::English
You mean except of patching the source?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With