I'd like to take user input, denoted as $dangerous_string, and use it as part of a RegEx in a MySQL query.
What's the best way to go about doing this? I want to use the user's string as a literal -- if it contains any characters that mean something in MySQL RegEx, those characters should not actually affect my Regular Expression.
$dangerous_string = $_GET["string"];
//do something here
$dangerous_string = what_goes_here($dangerous_string);
$sql = "SELECT * FROM table WHERE search_column REGEX '" . $mysqli->real_escape_string("[[:<:]]$dangerous_string") . "'";
//etc....
Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .
MySQL supports another type of pattern matching operation based on the regular expressions and the REGEXP operator. It provide a powerful and flexible pattern match that can help us implement power search utilities for our database systems.
MySQL only has one operator that allows you to work with regular expressions. This is the REGEXP operator, which works just like the LIKE operator, except that instead of using the _ and % wildcards, it uses a POSIX Extended Regular Expression (ERE).
Basically, LIKE does very simple wildcard matches, and REGEX is capable of very complicated wildcard matches. In fact, regular expressions ( REGEX ) are so capable that they are [1] a whole study in themselves [2] an easy way to introduce very subtle bugs.
preg_quote
and replace &
manuallypreg_quote
takes a string and escapes special characters with a backslash. It is meant for PHP regexes, not MySQL regexes, and it does not escape &
, which is needed for MySQL. So we only need to modify it like so:
function escape_regex_for_mysql($dangerous_string) {
return preg_replace('/&/', '\\&', preg_quote($dangerous_string));
}
Note that you should still use prepared statements (or $mysqli->real_escape_string
) on top, like this:
$query = $wpdb->prepare(
'SELECT * FROM table WHERE search_column REGEXP %s',
'[[:<:]]' . escape_regex_for_mysql($dangerous_string)
);
Let's look at MySQL's documentation to see which characters will need to be escaped. The documentation says:
MySQL >= 8.0.4 implements regular expression support using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe.
So let's look at the documentation for the ICU project:
\ (outside of sets) Quotes the following character. Characters that must be quoted to be treated as literals are * ? + [ ( ) { } ^ $ | \ .
\ (inside sets) Quotes the following character. Characters that must be quoted to be treated as literals are [ ] \ Characters that may need to be quoted, depending on the context are - &
So the list of special characters that need to be escaped are * ? + [ ] ( ) { } ^ $ \ . - &
. preg_quote
escapes all of these except &
. It also escapes some characters unnecessarily, but MySQL will handle that as expected.
Alternatively, you can add &
as a second parameter to preg_quote
for it to be escaped, e.g. preg_quote('AT&T', '&')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With