Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why 3 backslash equal 4 backslash in php?

<?php
$a='/\\\/';
$b='/\\\\/';
var_dump($a);//string '/\\/' (length=4)
var_dump($b);//string '/\\/' (length=4)
var_dump($a===$b);//boolean true
?>

Why is the string with 3 backslashes equal to the string with 4 backslashes in PHP?

And can we use the 3-backslash version in regular expression?

The PHP reference says we must use 4 backslashes.

Note: Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code.

like image 208
oliver Avatar asked Mar 18 '23 07:03

oliver


2 Answers

$b='/\\\\/';

php parses the string literal (more or less) character by character. The first input symbol is the forward slash. The result is a forward slash in the result (of the parsing step) and the input symbol (one character, the /) is taken away from the input.
The next input symbol is a backslash. It's taken from the input and the next character/symbol is inspected. It's also a backslash. That's a valid combination, so the second symbol is also taken from the input and the result is a single blackslash (for both input symbols).
The same with the third and fourth backslash.
The last input symbol (within the literal) is the forwardslash -> forwardslash in the result.
-> /\\/

Now for the string with three backslashes:

$a='/\\\/';

php "finds" the first blackslash, the next character is a blackslash - that's a valid combination resulting in one single blackslash in the result and both characters in the input literal taken. php then "finds" the third blackslash, the next character is a forward-slash, this is not a valid combination. So the result is a single blackslash (because php loves and forgives you....) and only one character taken from the input. The next input character is the forward-slash, resulting in a forwardslash in the result.
-> /\\/

=> both literals encode the same string.

like image 100
VolkerK Avatar answered Mar 29 '23 10:03

VolkerK


It is explained in the documentation on the page about Strings:

Under the Single quoted section it says:

The simplest way to specify a string is to enclose it in single quotes (the character ').

To specify a literal single quote, escape it with a backslash (\). To specify a literal backslash, double it (\\). All other instances of backslash will be treated as a literal backslash.

Let's try to interpret your strings:

$a='/\\\/';

The forward slashes (/) have no special meaning in PHP strings, they represent themselves.
The first backslash (\) escapes the second backslash, as explained in the first sentence from the second paragraph quoted above.
The third backslash stands for itself, as explained in the last sentence of the above quote, because it is not followed by an apostrophe (') or a backslash (\).

As a result, the variable $a contains this string: /\\/.

On

$b='/\\\\/';

there are two backslashes (the second and the fourth) that are escaped by the first and the third backslash. The final (runtime) string is the same as for $a: /\\/.

Note

The discussion above is about the encoding of strings in PHP source. As you can see, there always is more than one (correct) way to encode the same string. Other options (beside string literals enclosed in single or double quotes, using heredoc or nowdoc syntax) is to use constants (for literal backslashes, for example) and build the strings from pieces.

For example:

define('BS', '\');       // can also use '\\', the result is the same
$c = '/'.BS.BS.'/';

uses no escaping and a single backslash. The constant BS contains a literal backslash and it is used everywhere a backslash is needed for its intrinsic value. Where a backslash is needed for escaping then a real backslash is used (there is no way to use BS for that).

The escaping in regex is a different thing. First, the regex is parsed at the runtime and at runtime $a, $b and $c above contain /\\/, no matter how they were generated.

Then, in regex a backslash that is not followed by a special character is ignored (see the difference above, in PHP it is interpreted as a literal backslash).

Combining PHP & regex

There are endless possibilities to make the things complicate. Let's try to keep them simple and put some guidelines for regex in PHP:

  • enclose the regex string in apostrophes ('), if it's possible; this way there are only two characters that needs to be escaped for PHP: the apostrophe and the backslash;
  • when parse URLs, paths or other strings that can contain forward slashes (/) use #, ~, ! or @ as regex delimiter (which one is not used in the regex itself); this way there is no need to escape the delimiter when it is used inside the regex;
  • don't escape in regex characters when it's not needed; f.e., the dash (-) has a special meaning only when it is used in character classes; outside them it's useless to escape it (and even in character classes it can be used unquoted without having any special meaning if it is placed as the very first or the very last character inside the [...] enclosure);
like image 29
axiac Avatar answered Mar 29 '23 09:03

axiac