Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arabic Characters in JSON decoding [duplicate]

Tags:

json

php

$test = json_encode('بسم الله');
echo $test;

As a result of this code, the output is: "\u0628\u0633\u0645 \u0627\u0644\u0644\u0647" while it should be something like "بسم الله". Arabic Characters are encoded when being JSON encoded while at the Youtube API this is not the case: http://gdata.youtube.com/feeds/api/videos/RqMxTnTZeNE?v=2&alt=json

You can see at Youtube that Arabic characters are displayed properly. What could be my mistake?

HINT: I'm working on an API< the example is just for the sake of clarification.

like image 416
Mohamed Said Avatar asked Feb 20 '13 12:02

Mohamed Said


People also ask

Does JSON support Arabic?

So yes, JSON does support it.

What is the encoding for Arabic language?

All Arabic characters can be encoded using a single UTF-16 code unit (2 bytes), but they may take either 2 or 3 UTF-8 code units (1 byte each), so if you were just encoding Arabic, UTF-16 would be a more space efficient option.

Can JSON have UTF-8?

The JSON spec requires UTF-8 support by decoders. As a result, all JSON decoders can handle UTF-8 just as well as they can handle the numeric escape sequences. This is also the case for Javascript interpreters, which means JSONP will handle the UTF-8 encoded JSON as well.

Is Unicode allowed in JSON?

JSON data always uses the Unicode character set.


3 Answers

"\u0628\u0633\u0645 \u0627\u0644\u0644\u0647" and "بسم الله" are equivalent in JSON.

PHP just defaults to using Unicode escapes instead of literals for multibyte characters.

You can specify otherwise with JSON_UNESCAPED_UNICODE (providing you are using PHP 5.4 or later).

json_encode('بسم الله', JSON_UNESCAPED_UNICODE);
like image 150
Quentin Avatar answered Sep 26 '22 05:09

Quentin


That is the correct JSON encoded version of the UTF-8 string. There is no need to change it, it represents the correct string. Characters in JSON can be escaped this way.

JSON can represent UTF-8 characters directly if you want to. Since PHP 5.4 you have the option to set the JSON_UNESCAPED_UNICODE flag to produce raw UTF-8 strings:

json_encode($string, JSON_UNESCAPED_UNICODE)

But that is only a preference, it is not necessary.

like image 35
deceze Avatar answered Sep 25 '22 05:09

deceze


Both formats are valid and equivalent JSON strings:

char
    any-Unicode-character-
        except-"-or-\-or-
        control-character
    \"
    \\
    \/
    \b
    \f
    \n
    \r
    \t
    \u four-hex-digits

If you prefer the unencoded version, simply add the JSON_UNESCAPED_UNICODE flag:

<?php

$test = json_encode('بسم الله', JSON_UNESCAPED_UNICODE);
echo $test;

This flag requires PHP/5.4.0 or greater.

like image 2
Álvaro González Avatar answered Sep 22 '22 05:09

Álvaro González