Is it considered safe to use user-defined expressions with std::regex
(eg for a server-side search)? Does the standard library make any guarantees about the safety of broken expressions?
The standard requires that an implementation throws an error when the passed regex is invalid.
[regex.construct-3]:
explicit basic_regex(const charT* p, flag_type f = regex_constants::ECMAScript);
Requires:
p
shall not be a null pointer.Throws:
regex_error
ifp
is not a valid regular expression.Effects: Constructs an object of class basic_regex; the object's internal finite state machine is constructed from the regular expression contained in the array of
charT
of lengthchar_traits<charT>::length(p)
whose first element is designated byp
, and interpreted according to the flagsf
.Ensures:
flags()
returnsf
.mark_count()
returns the number of marked sub-expressions within the expression.
There is even a table detailing the different kinds of errors possible.
So as long as you do not pass a null pointer, there should be no undefined behavior in creating a regex from a user-provided string.
Note that any practical implementation may of course still have bugs that may lead to security vulnerabilities. The standard also obviously doesn't guarantee that a malicious user has no way to DoS your system by submitting a very complex/self-referential regex that produces too many matches, uses too much memory/CPU etc., so you'll have to consider that yourself. But if you are just worried whether an invalid regex is free to lead to UB, the answer is "no, you're fine".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With