Assuming a Perl script that allows users to specify several text filter expressions in a config file, is there a safe way to let them enter regular expressions as well, without the possibility of unintended side effects or code execution? Without actually parsing the regexes and checking them for problematic constructs, that is. There won't be any substitution, only matching.
As an aside, is there a way to test if the specified regex is valid before actually using it? I'd like to issue warnings if something like /foo (bar/
was entered.
Thanks, Z.
use re 'eval'
pragma is used:
(?{code})
(??{code})
${code}
@{code}
The default is no re 'eval'
; so unless I'm missing something, it should be safe to read regular expressions from a file, with the only check being the eval/catch posted by Axeman. At least I haven't been able to hide anything evil in them in my tests.
Thanks again. Z.
The Regex class itself is thread safe and immutable (read-only). That is, Regex objects can be created on any thread and shared between threads; matching methods can be called from any thread and never alter any global state.
Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts.
Regular expressions are useful in search and replace operations. The typical use case is to look for a sub-string that matches a pattern and replace it with something else. Most APIs using regular expressions allow you to reference capture groups from the search pattern in the replacement string.
Being more specific with your regular expressions, even if they become much longer, can make a world of difference in performance. The fewer characters you scan to determine the match, the faster your regexes will be.
Depending on what you're matching against, and the version of Perl you're running, there might be some regexes that act as an effective denial of service attack by using excessive lookaheads, lookbehinds, and other assertions.
You're best off allowing only a small, well-known subset of regex patterns, and expanding it cautiously as you and your users learn how to use the system. In the same way that many blog commenting systems allow only a small subset of HTML tags.
Eventually Parse::RecDescent might become useful, if you need to do complex analysis of regexes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With