Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does \\x80-\\xFF refer to?

Tags:

regex

php

In the process of looking for solutions to help sanitise some output, I came across code that does the following.

preg_replace('|[^a-z0-9-~+_.?#=!&;,/:%@$\|*\'()\\x80-\\xff]|i', '', $some_url)

Now, I think it's basically trying to remove anything other than the above mentioned characters. But doesn't \\x80-\\xff refer to some form of non-printable ascii characters ? If so, why would the code possibly be trying NOT to remove them ?

Any indications/pointers/help would be appreciated. Thanks.

like image 352
Grateful Avatar asked Sep 23 '14 04:09

Grateful


3 Answers

x80-xFF are non-ASCII character ranges. They're still printable, both in Latin-1, or encode higher code points for UTF-8.

Using \\x80 over \x80 is slightly more correct. The backslash escapes itself in strings. In single quoted strings too, albeit it's effectively irrelevant there.

In double quoted strings however using just \x80 would be interpreted by PHP, whereas \\x80 would be seen and interpreted by the regex engine.

like image 61
mario Avatar answered Oct 19 '22 23:10

mario


Okay, all the answers given so far lead me in the right direction and allowed me to find the following in the documentation.

After \x, up to two hexadecimal digits are read (letters can be in upper or lower case). In UTF-8 mode, \x{...} is allowed, where the contents of the braces is a string of hexadecimal digits. It is interpreted as a UTF-8 character whose code number is the given hexadecimal number. The original hexadecimal escape sequence, \xhh, matches a two-byte UTF-8 character if the value is greater than 127.

So, as a summary :-

i) '\x' allows for a hexadecimal escape sequence, after which, up to two hexadecimal digits are read

ii) '\xhh' the two 'hh' letters can be in upper or lower case

iii) '\xhh' specifies a code-point in the range 0-FF

iv) '\x80-\xFF' refers to a character range outside ASCII

like image 24
Grateful Avatar answered Oct 19 '22 22:10

Grateful


You don't need to use double backslash in a pattern with PHP, however even if you use it, it is ignored and read as an escape (like a simple backslash).

One exception, if you use the heredoc or nowdoc syntax to enclose the pattern, a double backslash is seen as a literal backslash.

like image 31
Casimir et Hippolyte Avatar answered Oct 20 '22 00:10

Casimir et Hippolyte