Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Values in UTF-8 being encoded as NULL in JSON

Tags:

json

php

utf-8

I have a set of keywords that are passed through via JSON from a DB (encoded UTF-8), some of which may have special characters like é, è, ç, etc. This is used as part of an auto-completer. Example:

array('Coffee', 'Cappuccino', 'Café');

I should add that the array as it comes from the DB would be:

array('Coffee', 'Cappuccino', 'Café');

But JSON encodes as:

["coffee", "cappuccino", null];

If I print these via print_r(), they show up fine on a UTF-8 encoded webpage, but café comes through as "café" if text/plain is used if I want to look at the array using print_r($array);exit();.

If I encode using utf8_encode() before encoding to JSON, it comes through fine, but what gets printed on the webpage is "café" and not "café".

Also strange, but json_last_error() is being seen as an undefined function, but json_decode() and json_encode() work fine.

Any ideas on how to get UTF-8 encoded data from the database to behave the same throughout the entire process?

EIDT: Here is the PHP function that grabs the keywords and makes them into a single array:

private function get_keywords() 
{
    global $db, $json;

    $output = array();

    $db->query("SELECT keywords FROM listings");

    while ($r = $db->get_array())
    {
        $split = explode(",", $r['keywords']);

        foreach ($split as $s)
        {
            $s = trim($s);
            if ($s != "" && !in_array($s, $output)) $output[] = strtolower($s);
        }
    }

    $json->echo_json($output);
}

The json::echo_json method just encodes, sets the header and prints it (for usage with Prototype)

EDIT: DB Connection method:

function connect()
{

    if ($this->set['sql_connect'])
    {
        $this->connection = @mysql_connect( $this->set['sql_host'], $this->set['sql_user'], $this->set['sql_pass'])
                OR $this->debug( "Connection Error", mysql_errno() .": ". mysql_error());
        $this->db = @mysql_select_db( $this->set['sql_name'], $this->connection)
                OR $this->debug( "Database Error", "Cannot Select Database '". $this->set['sql_name'] ."'");

        $this->is_connected = TRUE;
    }

    return TRUE;
}

More Updates: Simple PHP script I ran:

echo json_encode( array("Café") ); // ["Caf\u00e9"]
echo json_encode( array("Café") ); // null
like image 517
mwieczorek Avatar asked Sep 12 '10 09:09

mwieczorek


People also ask

Can JSON handle UTF-8?

The default encoding is UTF-8, and JSON texts that are encoded in UTF-8 are interoperable in the sense that they will be read successfully by the maximum number of implementations; there are many implementations that cannot successfully read texts in other encodings (such as UTF-16 and UTF-32).

Can UTF-8 contain null?

UTF-8 is an encoding that is used to represent multibyte character sets in a way that is backward-compatible with single-byte character sets. Another advantage of UTF-8 is that it ensures there are no NULL bytes in the data, with the exception of an actual NULL byte.

What characters are not allowed in UTF-8?

Yes. 0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF are invalid UTF-8 code units.

Is JSON Unicode or ASCII?

JSON data always uses the Unicode character set.


1 Answers

The reason could be the current client character setting. A simple solution could be to do set the client with mysql_query('SET CHARACTER SET utf8') before running the SELECT query.

Update (June 2014)

The mysql extension is deprecated as of PHP 5.5.0. It is now recommended to use mysqli. Also, upon further reading - the above way of setting the client set should be avoided for reasons including security.

I haven't tested it, but this should be an ok substitute:

$mysqli = new mysqli("localhost", "my_user", "my_password", "my_db");
if (!$mysqli->set_charset('utf8')) {
    printf("Error loading character set utf8: %s\n", $mysqli->error);
} else {
    printf("Current character set: %s\n", $mysqli->character_set_name());
}

or with the connection parameter :

$conn = mysqli_connect("localhost", "my_user", "my_password", "my_db");
if (!mysqli_set_charset($conn, "utf8")) {
    # TODO - Error: Unable to set the character set
    exit;
}
like image 166
ılǝ Avatar answered Oct 30 '22 14:10

ılǝ