Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Escaping escape Characters

I'm trying to mimic the json_encode bitmask flags implemented in PHP 5.3.0, here is the string I have:

$s = addslashes('O\'Rei"lly'); // O\'Rei\"lly

Doing json_encode($s, JSON_HEX_APOS | JSON_HEX_QUOT) outputs the following:

"O\\\u0027Rei\\\u0022lly"

And I'm currently doing this in PHP versions older than 5.3.0:

str_replace(array('\\"', "\\'"), array('\\u0022', '\\\u0027'), json_encode($s))
or
str_replace(array('\\"', '\\\''), array('\\u0022', '\\\u0027'), json_encode($s))

Which correctly outputs the same result:

"O\\\u0027Rei\\\u0022lly"

I'm having trouble understanding why do I need to replace single quotes ('\\\'' or even "\\'" [surrounding quotes excluded]) with '\\\u0027' and not just '\\u0027'.


Here is the code that I'm having trouble porting to PHP < 5.3:

if (get_magic_quotes_gpc() && version_compare(PHP_VERSION, '6.0.0', '<'))
{
    /* JSON_HEX_APOS and JSON_HEX_QUOT are availiable */
    if (version_compare(PHP_VERSION, '5.3.0', '>=') === true)
    {
        $_GET = json_encode($_GET, JSON_HEX_APOS | JSON_HEX_QUOT);
        $_POST = json_encode($_POST, JSON_HEX_APOS | JSON_HEX_QUOT);
        $_COOKIE = json_encode($_COOKIE, JSON_HEX_APOS | JSON_HEX_QUOT);
        $_REQUEST = json_encode($_REQUEST, JSON_HEX_APOS | JSON_HEX_QUOT);
    }

    /* mimic the behaviour of JSON_HEX_APOS and JSON_HEX_QUOT */
    else if (extension_loaded('json') === true)
    {
        $_GET = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_GET));
        $_POST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_POST));
        $_COOKIE = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_COOKIE));
        $_REQUEST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_REQUEST));
    }

    $_GET = json_decode(stripslashes($_GET));
    $_POST = json_decode(stripslashes($_POST));
    $_COOKIE = json_decode(stripslashes($_COOKIE));
    $_REQUEST = json_decode(stripslashes($_REQUEST));
}
like image 914
Alix Axel Avatar asked May 20 '10 03:05

Alix Axel


People also ask

How many escape characters are there?

We use escape characters to perform some specific task. The total number of escape sequences or escape characters in Java is 8. Each escape character is a valid character literal.

What is escaping characters in Java?

Escape sequences are used to signal an alternative interpretation of a series of characters. In Java, a character preceded by a backslash (\) is an escape sequence. The Java compiler takes an escape sequence as one single character that has a special meaning.


2 Answers

The PHP string

'O\'Rei"lly'

is just PHP's way of getting the literal value

O'Rei"lly

into a string which can be used. Calling addslashes on that string changes it to be literally the following 11 characters

O\'Rei\"lly

i.e. strlen(addslashes('O\'Rei"lly')) == 11

This is the value which is being sent to json_escape.

In JSON backslash is an escape character, so that needs to be escaped, i.e.

\ to be \\

Also single and double quotes can cause problems. So converting them to their unicode equivalent in one way to avoid problems. So later verions of PHP's json_encode change

' to be \u0027

and

" to be \u0022

So applying these three rules to

O\'Rei\"lly

gives us

O\\\u0027Rei\\\u0022lly

This string is then wrapped in double quotes to make it a JSON string. Your replace expressions include the leading forward slashes. Either by accident or on purpose this means that the leading and trailing double quote returned by json_encode is not subject to the escaping, which it shouldn't be.

So in earlier versions of PHP

$s = addslashes('O\'Rei"lly');
print json_encode($s);

would print

"O\\'Rei\\\"lly"

and we want to change ' to be \u0027 and we want to change \" to be \u0022 because the \ in \" is just to get the " into the string because it begins and ends with double-quotes.

So that's why we get

"O\\\u0027Rei\\\u0022lly"
like image 92
awatts Avatar answered Oct 11 '22 22:10

awatts


It's escaping the backslash as well as the quote. It's difficult dealing with escaped escapes, as you're doing here, as it quickly turns into backslash counting games. :-/

like image 33
staticsan Avatar answered Oct 11 '22 23:10

staticsan