Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extra backslash needed in PHP regexp pattern

Tags:

regex

php

pcre

When testing an answer for another user's question I found something I don't understand. The problem was to replace all literal \t \n \r characters from a string with a single space.

Now, the first pattern I tried was:

/(?:\\[trn])+/

which surprisingly didn't work. I tried the same pattern in Perl and it worked fine. After some trial and error I found that PHP wants 3 or 4 backslashes for that pattern to match, as in:

/(?:\\\\[trn])+/

or

/(?:\\\[trn])+/

these patterns - to my surprise - both work. Why are these extra backslashes necessary?

like image 682
Matteo Riva Avatar asked Jan 27 '10 09:01

Matteo Riva


People also ask

What is double backslash in regex?

The backslash character ( \ ) is the escaping character. It can be used to denote an escaped character, a string, literal, or one of the set of supported special characters. Use a double backslash ( \\ ) to denote an escaped string literal.

What is the use of backslash in PHP?

Escape Sequences In PHP, an escape sequence starts with a backslash \ . Escape sequences apply to double-quoted strings. A single-quoted string only uses the escape sequences for a single quote or a backslash.

What is a backslash in regex?

\ The backslash suppresses the special meaning of the character it precedes, and turns it into an ordinary character. To insert a backslash into your regular expression pattern, use a double backslash ('\\'). ( ) The open parenthesis indicates a "subexpression", discussed below.

Does PHP have pattern matching?

Checking information entered by users into a form is referred to as form validation. There are many different forms of validation, but the basic pattern match function in PHP is eregi , which stands for “evaluate regular expression, case insensitive”.


1 Answers

You need 4 backslashes to represent 1 in regex because:

  • 2 backslashes are used for unescaping in a string ("\\\\" -> \\)
  • 1 backslash is used for unescaping in the regex engine (\\ -> \)

From the PHP doc,

escaping any other character will result in the backslash being printed too1

Hence for \\\[,

  • 1 backslash is used for unescaping the \, one stay because \[ is invalid ("\\\[" -> \\[)
  • 1 backslash is used for unescaping in the regex engine (\\[ -> \[)

Yes it works, but not a good practice.

like image 107
kennytm Avatar answered Sep 21 '22 02:09

kennytm