Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

json_encode returns false when dealing with multibyte substring

I am using almost the latest version of php (5.5.11) and here is the problem. When I use json_encode of the part of the string, it returns false. In the beginning I was using substr, but then I realized that this is totally wrong when dealing with non-English strings. But even after I used mb_substr I still see that json_encode returns false:

$s = "に搭載されるようになると、その手軽さからJは急速に普及していく。、通信に関する標準を策定する国際団体インターナショナル";
$a = mb_substr($s, 0, 10);

As you see,

var_dump( json_encode([
    'd' => $a
]) );

returns false, and

var_dump( json_encode([
    'd' => $s
]) );

returns correct json.

When looking into json_last_error, I see that this is due to Malformed UTF-8 characters, possibly incorrectly encoded. So the problem is that mb_substr gives me malformed characters.

When I look at var_dump($a); I see that it produces string(10) "に搭載�" (I assume that each Japanese char is 3 bytes, and that question mark is malformed char).

So how can I get a substring from the string in such a way, that I will not get a malformed string?

like image 718
Salvador Dali Avatar asked May 04 '14 10:05

Salvador Dali


1 Answers

Simply pass the utf-8 encoding as the fourth parameter of the mb_substr() and you are good to go.

$a = mb_substr($s, 0, 10,'utf-8');
echo $a; // に搭載されるようにな
echo json_encode($a); // "\u306b\u642d\u8f09\u3055\u308c\u308b\u3088\u3046\u306b\u306a"

Demonstration

like image 153
Shankar Narayana Damodaran Avatar answered Nov 06 '22 06:11

Shankar Narayana Damodaran