This answer explains that to validate an arbitrary regular expression, one simply uses eval
:
while (<>) {
eval "qr/$_/;"
print $@ ? "Not a valid regex: $@\n" : "That regex looks valid\n";
}
However, this strikes me as very unsafe, for what I hope are obvious reasons. Someone could input, say:
foo/; system('rm -rf /'); qr/
or whatever devious scheme they can devise.
The natural way to prevent such things is to escape special characters, but if I escape too many characters, I severely limit the usefulness of the regex in the first place. A strong argument can be made, I believe, that at least []{}()/-,.*?^$!
and white space characters ought to be permitted (and probably others), un-escaped, in a user regex interface, for the regexes to have minimal usefulness.
Is it possible to secure myself from regex injection, without limiting the usefulness of the regex language?
The solution is simply to change
eval("qr/$_/")
to
eval("qr/\$_/")
This can be written more clearly as follows:
eval('qr/$_/')
But that's still not optimal. The following would be far better as it doesn't involve generating and compiling Perl code at run-time:
eval { qr/$_/ }
Note that neither solution protects you from denial of service attacks. It's quite easy to write a pattern that will take longer than the life of the universe to complete. To hand that situation, you could execute the regex match in a child for which CPU ulimit
has been set.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With