Edit: tchrist has informed me that my original accusations about Perl's insecurity are unfounded. However, the question still stands.
I know that in Perl, you can embed arbitrary code in a regular expression, so obviously accepting a user-supplied regex and matching it allows arbitrary code execution and is a clear security hole. But is this true for all languages that use regular expressions? Is it true for all languages that use "Perl-compatible" regular expressions? In which languages are user-supplied regexes safe to use, and in which languages do they allow arbitrary code execution or other security holes?
Regex support is part of the standard library of many programming languages, including Java and Python, and is built into the syntax of others, including Perl and ECMAScript. Implementations of regex functionality is often called a regex engine, and a number of libraries are available for reuse.
Regexes are routinely used in the cybersecurity world by: Analysts searching logs and other large data files. Data scientists massaging input data files so they can be ingested into machine learning models. Developers validating input fields and so on.
The Regex class itself is thread safe and immutable (read-only). That is, Regex objects can be created on any thread and shared between threads; matching methods can be called from any thread and never alter any global state.
Regular expressions are useful in any scenario that benefits from full or partial pattern matches on strings. These are some common use cases: verify the structure of strings. extract substrings form structured strings.
User-supplied regex, or in general, user input, should never be treated as safe - regardless of the programming language. If your program fails to do so, it is vulnerable to attacks by deliberately crafted inputs.
In the case of Regex, it can be ReDos
: Regex Denial of Service. Basically, a regex which consumes an excessive amount of CPU and memory to process.
For e.g: if you try to evaluate this regex
^(([a-z])+.)+[A-Z]([a-z])+$
on this input:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!
you'll notice it may hang - it's called catastrophic backtrack. See it for yourself here: https://regex101.com/r/Qhn3Vb/1
Read more about Regex DoS: https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS
Bottomline: never assume user input is safe!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With