Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

json_encode(): Invalid UTF-8 sequence in argument

I'm calling json_encode() on data that comes from a MySQL database with utf8_general_ci collation. The problem is that some rows have weird data which I can't clean. For example symbol , so once it reaches json_encode(), it fails with json_encode(): Invalid UTF-8 sequence in argument.

I've tried utf8_encode() and utf8_decode(), even with mb_check_encoding() but it keeps getting through and causing havoc.

Running PHP 5.3.10 on Mac. So the question is - how can I clean up invalid utf8 symbols, keeping the rest of data, so that json_encoding() would work?

Update. Here is a way to reproduce it:

echo json_encode(pack("H*" ,'c32e'));
like image 225
Artjom Kurapov Avatar asked Apr 18 '12 08:04

Artjom Kurapov


6 Answers

I had a similar error which caused json_encode to return a null field whenever there was a hi-ascii character such as a curly apostrophe in a string, due to the wrong character set being returned in the query.

The solution was to make sure it comes as utf8 by adding:

mysql_set_charset('utf8');

after the mysql connect statement.

like image 187
Robert Imhoff Avatar answered Nov 07 '22 17:11

Robert Imhoff


Seems like the symbol was Å, but since data consists of surnames that shouldn't be public, only first letter was shown and it was done by just $lastname[0], which is wrong for multibyte strings and caused the whole hassle. Changed it to mb_substr($lastname, 0, 1) - works like a charm.

like image 37
Artjom Kurapov Avatar answered Nov 07 '22 19:11

Artjom Kurapov


The problem is that this character is UTF8, but json_encode does not handle it correctly. To say more, there is a list of other characters (see Unicode characters list), that will trigger the same error, so stripping off this one (Å) will not correct an issue to the end.

What we have used is to convert these chars to html entities like this:

htmlentities( (string) $value, ENT_QUOTES, 'utf-8', FALSE);
like image 34
serge.k Avatar answered Nov 07 '22 19:11

serge.k


Make sure that your connection charset to MySQL is UTF-8. It often defaults to ISO-8859-1 which means that the MySQL driver will convert the text to ISO-8859-1.

You can set the connection charset with mysql_set_charset, mysqli_set_charset or with the query SET NAMES 'utf-8'

like image 22
Emil Vikström Avatar answered Nov 07 '22 17:11

Emil Vikström


Using this code might help. It solved my problem!

mb_convert_encoding($post["post"],'UTF-8','UTF-8');

or like that

mb_convert_encoding($string,'UTF-8','UTF-8');
like image 10
Can Uludağ Avatar answered Nov 07 '22 17:11

Can Uludağ


The symbol you posted is the placeholder symbol for a broken byte sequence. Basically, it's not a real symbol but an error in your string.

What is the exact byte value of the symbol? Blindly applying utf8_encode is not a good idea, it's better to find out first where the byte(s) came from and what they mean.

like image 3
Evert Avatar answered Nov 07 '22 19:11

Evert