Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

To double escape or not to double escape in PHP PCRE functions?

Tags:

regex

php

I was looking for a solid article on when double escaping is necessary and when it is not, but I was not able to find anything. Perhaps I didn't look hard enough, because I'm sure there is an explanation out there somewhere, but lets just make it easy to find for the next guy that has this question!

Take for example the following regex patterns:

/\n/
/domain\.com/
/myfeet \$ your feet/

Nothing ground breaking right? OK, lets use those examples within the context of PHP's preg_match function:

$foo = preg_match("/\n/", $bar);
$foo = preg_match("/domain\.com/", $bar);
$foo = preg_match("/myfeet \$ your feet/", $bar);

To my understanding, a backslash in the context of a quoted string value escapes the following character, and the expression is being given via a quoted string value.

Would the previous be like doing the folloing, and wouldnt this cause an error?:

$foo = preg_match("/n/", $bar);
$foo = preg_match("/domain.com/", $bar);
$foo = preg_match("/myfeet $ your feet/", $bar);

Which is not what I want right? those expressions are not the same as above.

Would I not have to write them double escaped like this?

$foo = preg_match("/\\n/", $bar);
$foo = preg_match("/domain\\.com/", $bar);
$foo = preg_match("/myfeet \\$ your feet/", $bar);

So that when PHP processes the string it escapes the backslash to a backslash which is then left in when its passed to the PCRE interpreter?

Or does PHP just magically know that I want to pass that backslash to the PCRE interpreter... i mean how does it know I'm not trying to \" escape a quote that I want to use in my expression? or are only double slashes required when using an escaped quote? And for that matter, would you need to TRIPLE escape a quote? \\\" You know, so that the quote is escaped and a double is left over?

Whats the rule of thumb with this?

I just did a test with PHP:

$bar = "asdfasdf a\"ONE\"sfda dsf adsf me & mine adsf asdf asfd ";

echo preg_match("/me \$ mine/", $bar);
echo "<br /><br />";
echo preg_match("/me \\$ mine/", $bar);
echo "<br /><br />";
echo preg_match("/a\"ONE\"/", $bar);
echo "<br /><br />";
echo preg_match("/a\\\"ONE\\\"/", $bar);
echo "<br /><br />";

Output:

0

1

1

1

So, it looks like somehow it doesnt really matter for quotes, but for the dollar sign, a double escape is required as I thought.

like image 785
Rick Kukiela Avatar asked Jan 15 '23 06:01

Rick Kukiela


1 Answers

Double quoted strings

When it comes to escaping inside double quotes, the rule is that PHP will inspect the character(s) immediately following the backslash.

If the neighboring character is in the set ntrvef\$" or if a numeric value follows it (rules can be found here) it gets evaluated as the corresponding control character or ordinal (hexadecimal or octal) representation, respectively.

It's important to note that if an invalid escape sequence is given, the expression is not evaluated and both the backslash and character remain. This is different from some other languages where an invalid escape sequence would cause an error instead.

E.g. "domain\.com" will be left as is.

Note that variables get expanded inside double quotes as well, e.g. "$var" needs to be escaped as "\$var".

Single quotes strings

Since PHP 5.1.1, any backslash inside single quoted strings (and followed by at least one character) will get printed as is and no variables get substituted either. This is by far the most convenient feature of single quoted strings.

Regular expressions

For escaping regular expressions, it's best to leave escaping to preg_quote():

$foo = preg_match('/' . preg_quote('mine & yours', '/') . '/', $bar);

This way you don't have to worry about which characters need to be escaped, so it works well for user input.

See also: preg_quote

Update

You added this test:

"/me \$ mine/"

This gets evaluated as "/me $ mine/"; but in PCRE the $ has a special meaning (it's an end-of-subject anchor).

"/me \\$ mine/"

This is evaluated as "/me \$ mine/" and so the backslashes is escaped for PHP itself whereas the $ is escaped for PCRE. This only works by accident btw.

$var = 'something';

"/me \\$var mine/"

This gets evaluated as "/me \something", so you need to escape the $ again.

"/me \\\$var mine/"
like image 52
Ja͢ck Avatar answered Jan 31 '23 23:01

Ja͢ck