Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to escape user input for Regular Expressions in MySQL?

I'd like to take user input, denoted as $dangerous_string, and use it as part of a RegEx in a MySQL query.

What's the best way to go about doing this? I want to use the user's string as a literal -- if it contains any characters that mean something in MySQL RegEx, those characters should not actually affect my Regular Expression.

$dangerous_string = $_GET["string"];
//do something here
$dangerous_string = what_goes_here($dangerous_string);
$sql = "SELECT * FROM table WHERE search_column REGEX '" . $mysqli->real_escape_string("[[:<:]]$dangerous_string") . "'";

//etc....
like image 631
Sambo Avatar asked Sep 23 '10 20:09

Sambo


People also ask

How do you escape expressions in regex?

Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .

Can I use regex in MySQL?

MySQL supports another type of pattern matching operation based on the regular expressions and the REGEXP operator. It provide a powerful and flexible pattern match that can help us implement power search utilities for our database systems.

What flavor of regex does MySQL use?

MySQL only has one operator that allows you to work with regular expressions. This is the REGEXP operator, which works just like the LIKE operator, except that instead of using the _ and % wildcards, it uses a POSIX Extended Regular Expression (ERE).

What is the difference between like and regex operators in MySQL?

Basically, LIKE does very simple wildcard matches, and REGEX is capable of very complicated wildcard matches. In fact, regular expressions ( REGEX ) are so capable that they are [1] a whole study in themselves [2] an easy way to introduce very subtle bugs.


1 Answers

Use preg_quote and replace & manually

preg_quote takes a string and escapes special characters with a backslash. It is meant for PHP regexes, not MySQL regexes, and it does not escape &, which is needed for MySQL. So we only need to modify it like so:

function escape_regex_for_mysql($dangerous_string) {
    return preg_replace('/&/', '\\&', preg_quote($dangerous_string));
}

Note that you should still use prepared statements (or $mysqli->real_escape_string) on top, like this:

$query = $wpdb->prepare(
    'SELECT * FROM table WHERE search_column REGEXP %s',
    '[[:<:]]' . escape_regex_for_mysql($dangerous_string)
);

Rationale:

Let's look at MySQL's documentation to see which characters will need to be escaped. The documentation says:

MySQL >= 8.0.4 implements regular expression support using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe.

So let's look at the documentation for the ICU project:

\ (outside of sets) Quotes the following character. Characters that must be quoted to be treated as literals are * ? + [ ( ) { } ^ $ | \ .

\ (inside sets) Quotes the following character. Characters that must be quoted to be treated as literals are [ ] \ Characters that may need to be quoted, depending on the context are - &

So the list of special characters that need to be escaped are * ? + [ ] ( ) { } ^ $ \ . - &. preg_quote escapes all of these except &. It also escapes some characters unnecessarily, but MySQL will handle that as expected.

Alternatively, you can add & as a second parameter to preg_quote for it to be escaped, e.g. preg_quote('AT&T', '&')

like image 98
Flimm Avatar answered Oct 20 '22 20:10

Flimm