Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 character encoding battles json_encode() [duplicate]

Quest

I am looking to fetch rows that have accented characters. The encoding for the column (NAME) is latin1_swedish_ci.

The Code

The following query returns Abord â Plouffe using phpMyAdmin:

SELECT C.NAME FROM CITY C
WHERE C.REGION_ID=10 AND C.NAME_LOWERCASE LIKE '%abor%'
ORDER BY C.NAME LIMIT 30

The following displays expected values (function is called db_fetch_all( $result )):

  while( $row = mysql_fetch_assoc( $result ) ) {
    foreach( $row as $value ) {
      echo $value . " ";
      $value = utf8_encode( $value );
      echo $value . " ";
    }

    $r[] = $row;
  }

The displayed values: 5482 5482 Abord â Plouffe Abord â Plouffe

The array is then encoded using json_encode:

$rows = db_fetch_all( $result );
echo json_encode( $rows );

Problem

The web browser receives the following value:

{"ID":"5482","NAME":null}

Instead of:

{"ID":"5482","NAME":"Abord â Plouffe"}

(Or the encoded equivalent.)

Question

The documentation states that json_encode() works on UTF-8. I can see the values being encoded from LATIN1 to UTF-8. After the call to json_encode(), however, the value becomes null.

How do I make json_encode() encode the UTF-8 values properly?

One possible solution is to use the Zend Framework, but I'd rather not if it can be avoided.

like image 340
Dave Jarvis Avatar asked May 07 '10 16:05

Dave Jarvis


People also ask

What does json_encode mean?

The json_encode() function is used to encode a value to JSON format.

Is JSON always UTF-8?

JSON decoders always assume UTF-8, even the PHP implementation, even though PHP doesn't normally assume UTF-8 in many other functions.

Can I JSON encode a string?

These values (namely value1,value2, value3,...) can contain any special characters. JSON is an acronym for JavaScript Object Notation , so your asking if there is a JS way to encode/decode a JavaScript Object from and to a string? The answer is yes: JSON.

Is JSON decode slow?

Go JSON decoding is very slow.


4 Answers

// Create an empty array for the encoded resultset
$rows = array();

// Loop over the db resultset and put encoded values into $rows
while($row = mysql_fetch_assoc($result)) {
  $rows[] = array_map('utf8_encode', $row);
}

// Output $rows
echo json_encode($rows);
like image 50
Kemo Avatar answered Sep 30 '22 17:09

Kemo


foreach( $row as $value ) {
  $value = utf8_encode( $value );

You're not actually writing your encoded value back into the $row array there, you're only changing the local variable $value. If you want to write back when you change the variable, you would need to treat it as a reference:

foreach( $row as &$value ) {

Personally I would try to avoid references where possible, and for this case instead use array_map as posted by Kemo.

Or mysql_set_charset to UTF-8 to get the return values in UTF-8 regardless of the actual table collations, as a first step towards migrating the app to UTF-8.

like image 34
bobince Avatar answered Sep 30 '22 16:09

bobince


My solution is insert this line mysql_query('SET CHARACTER SET utf8');, before the SELECT. This method is good.

like image 5
jailsonjan Avatar answered Sep 30 '22 16:09

jailsonjan


It seems that rather than putting it in a query, one should put:

mysql_set_charset('utf8');

after the mysql connect statement.

like image 4
Robert Imhoff Avatar answered Sep 30 '22 16:09

Robert Imhoff