Are there any security concerns if I run a user defined regular expression on my server with a user defined input string? I'm not asking about a single language, but any language really, with PHP as one of the main language I would like to know about.
For example, if I have the code below:
<?php
if(isset($_POST['regex'])) {
preg_match($_POST['regex'], $_POST['match'], $matches);
var_dump($matches);
}
?>
<form action="" method="post">
<input type="text" name="regex">
<textarea name="match"></textarea>
<input type="submit">
</form>
Providing this is not a controlled environment (i.e. the user can't be trusted), what are the risks of the above code? If a similar code is written for other languages, are there risks in these other languages? If so, which languages consist of threats?
I already found out about 'evil regular expressions', however, no matter what I try on my computer, they seem to work fine, see below.
PHP
<?php
php > preg_match('/^((ab)*)+$/', 'ababab', $matches);var_dump($matches);
array(3) {
[0] =>
string(6) "ababab"
[1] =>
string(0) ""
[2] =>
string(2) "ab"
}
php > preg_match('/^((ab)*)+$/', 'abababa', $matches);var_dump($matches);
array(0) {
}
JavaScript
phantomjs> /^((ab)*)+$/g.exec('ababab');
{
"0": "ababab",
"1": "ababab",
"2": "ab",
"index": 0,
"input": "ababab"
}
phantomjs> /^((ab)*)+$/g.exec('abababa');
null
This leads me to believe that PHP and JavaScript have a fail-safe mechanism for evil regexes. Based on that, I would have that other languages have similar features.
Is this a correct assumption?
Finally, for any or all of the languages that may be harmful, are there any ways to make sure the regular expressions doesn't cause damage?
When you are running user-defined regex with user-defined string on your side, it is possible for user to craft a catastrophic backtracking regex, usually with failing input to cause denial of service on your system.
Using your example ^((ab)*)+$
, you need a slightly longer, failing input to cause catastrophic backtracking to take effect: "ababababababababababababababababababababababd"
.
preg_last_error
should return PREG_BACKTRACK_LIMIT_ERROR
.false
. On Chrome 31.0.1650.63 m and Internet Explorer 11, catastrophic backtracking can be observed.Depending on the API of the language/library, the API may provide an option to limit the number of backtracking attempts or set time-out to the operation; it is strongly recommended that you set the limit in order to prevent DoS on your server.
Regex
class comes with an API to limit the time taken for matching.If the language doesn't come with such convenient API, it is strongly recommended that you implement your own time out mechanism to time-out the execution.
Unless the specs of the regex engine includes requirement to prevent catastrophic backtracking (e.g. PCRE has a default backtracking limit), you shouldn't rely on the behavior of specific implementation (like the case of Firefox as described above).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With