Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I make MySQL return UTF-8?

Tags:

I'm using PHPUnit to validate XML output from my PHP code, but apparently I have problems with the character encoding MySQL returns. Here is the error I get from DOMDocument:

Input is not proper UTF-8, indicate encoding! Bytes: 0xE9 0x20 0x42 0x65 

I initialize the DOMDocument so it uses the correct encoding:

$domDocument = new DOMDocument('1.0','UTF-8'); 

And when I check the output from saveXML() using mb_detect_encoding the result is UTF-8.

I also checked all the calls used to create the XML, using mb_detect_encoding on all createCDATASection parameters encountered and they are all either UTF-8 or ASCII (there are no plain text nodes, everything is in CDATA blocks).

I think the issue comes from the use of an 'é' character (which is 0xE9 in ISO 8859-1). The line which adds that character to my XML is:

$domDocument->createCDATASection($place->name); 

and mb_detect_encoding($place->name) gives me UTF-8.

The data ($place->name) is pulled from a MySQL database. This database has the UTF-8 charset.

Here is some example code:

$query = sprintf('SELECT name FROM place where id = 1'); $result = mysql_query($query); $result = mysql_fetch_assoc($result);   // -- Feeding UTF-8 data directly WORKS $domDocument = new DOMDocument('1.0','UTF-8'); $rootNode = $domDocument->createElement('Response'); $rootNode->appendChild($domDocument->createCDATASection('Café Belga')); $domDocument->appendChild($rootNode);  $matcher = array('tag' => 'Response'); self::assertTag($matcher, $domDocument->saveXML(), '', FALSE);  // -- Feeding UTF-8 data from the resultset FAILS $domDocument = new DOMDocument('1.0','UTF-8'); $rootNode = $domDocument->createElement('Response'); $rootNode->appendChild($domDocument->createCDATASection($result['name'])); $domDocument->appendChild($rootNode);  $matcher = array('tag' => 'Response'); self::assertTag($matcher, $domDocument->saveXML(), '', FALSE); 

In my PHPStorm debugger, the string fetched from the database looks like this:

Caf� Belga

So I think that is the root of the problem. In MySQLWorkbench the string is correct: Café Belga.

When using utf8_encode($result['name']), however, everything works fine!

One more check in the watches window:

mb_detect_encoding($result['name']) -> "UTF-8"

mb_detect_encoding(utf8_encode($result['name'])) -> "UTF-8"

On a side note, are there any sites where I can simply copy-paste those hex values and see what characters they are supposed to be in different character sets?

like image 465
Joris Mans Avatar asked Jun 03 '11 09:06

Joris Mans


People also ask

Does MySQL support UTF-8?

MySQL supports multiple Unicode character sets: utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character.

How do I change the default charset in MySQL?

The MySQL server has a compiled-in default character set and collation. To change these defaults, use the --character-set-server and --collation-server options when you start the server.

How do I change a character set in MySQL to latin1?

Use the ALTER DATABASE and ALTER TABLE commands. The CONVERT TO technique assumes that the text was correctly stored in some other charset (eg, latin1), and not mangled (such as UTF-8 bytes crammed into latin1 column without conversion to latin1).


2 Answers

You have to define the connection to your database as UTF-8:

// Set up your connection $connection = mysql_connect('localhost', 'user', 'pw'); mysql_select_db('yourdb', $connection); mysql_query("SET NAMES 'utf8'", $connection);  // Now you get UTF-8 encoded stuff $query = sprintf('SELECT name FROM place where id = 1'); $result = mysql_query($query, $connection); $result = mysql_fetch_assoc($result); 
like image 55
strauberry Avatar answered Sep 27 '22 23:09

strauberry


From version PHP 5.5.0 you should use

mysqli_set_charset($connection,"utf8"); 
like image 32
Eric Korolev Avatar answered Sep 28 '22 01:09

Eric Korolev