Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP Unicode in JSON

Tags:

json

php

unicode

I'm sending a JSON POST body to my PHP web service that looks something like this:

{
    "foo": "☺"
}

When I echo out the body in the PHP, I see this:

{
    "foo":"\xe2\x98\xba"
}

I've also tried sending the \uXXXX equivalent:

{
    "foo": "\u263a"
}

This got further, in that the raw JSON string received had "foo":"\\u263a", but after json_decode the value turned to \xe2\x98\xba.

This is causing problems when I come to use the value in a JSON response. I get:

json_encode(): Invalid UTF-8 sequence in argument

At its simplest, this is what happens why I try to JSON encode the string:

> php -r 'echo json_encode("\x98\xba\xe2");'
PHP Warning:  json_encode(): Invalid UTF-8 sequence in argument in Command line code on line 1

My question is: how can I best get this smiley face from one end of my application to the other?

I'd appreciate any help you could offer.

like image 330
Ross McFarlane Avatar asked Jun 03 '13 11:06

Ross McFarlane


People also ask

How to convert a request to JSON in PHP?

PHP File explained: 1 Convert the request into an object, using the PHP function json_decode (). 2 Access the database, and fill an array with the requested data. 3 Add the array to an object, and return the object as JSON using the json_encode () function. More ...

What is JSON_encode ()?

Like the reference JSON encoder, json_encode () will generate JSON that is a simple value (that is, neither an object nor an array) if given a string, int, float or bool as an input value. While most decoders will accept these values as valid JSON, some may not, as the specification is ambiguous on this point.

What is the return type of JSON in PHP?

Returns the value encoded in json in appropriate PHP type. Values true, false and null are returned as TRUE, FALSE and NULL respectively. NULL is returned if the json cannot be decoded or if the encoded data is deeper than the recursion limit.

What is JSON_throw_on_error in PHP?

Returns the value encoded in json in appropriate PHP type. Values true, false and null are returned as true, false and null respectively. null is returned if the json cannot be decoded or if the encoded data is deeper than the nesting limit. JSON_THROW_ON_ERROR flags was added.


2 Answers

I believe this is the correct behavior of json_encode. If you use the following:

<script>
    alert(
     <?php
       $a = "☺";
       echo json_encode($a);
     ?>
    );
</script>

The HTML output will be alert("\u263a"); and the alert will show since "\u263a" is a correct representation of the string in JavaScript.

Usage of JSON_UNESCAPED_UNICODE constant as the second parameter of json_encode in PHP is also an option, but available only for PHP 5.4.0 or newer.

In what scenario do you intend to use the value?


Edit:

php -r 'echo json_encode("\x98\xba\xe2");'

PHP Warning: json_encode(): Invalid UTF-8 sequence in argument in Command line code on line 1

The problem is you use a wrong sequence of characters. It should be

echo json_encode("\xe2\x98\xba"); // this works for me

instead of

echo json_encode("\x98\xba\xe2"); 
like image 99
Mifeet Avatar answered Sep 25 '22 04:09

Mifeet


PHP's json_decode() function behaves correctly given your input case, returning the sequence of UTF-8 bytes (E2 98 BA) that represent the character.

However, Apache HTTPD applies the \x escaping (in function ap_escape_logitem()) before writing the line to the error log (as you did for testing purposes using error_log()). As noted in file server/gen_test_char.c, "all [...] 8-bit chars with the high bit set" are escaped.

like image 20
PleaseStand Avatar answered Sep 22 '22 04:09

PleaseStand