I fail to select some entries in a mysql database, comparing french strings with accentuated characters.
Datas samples below are oversimplified for more readability. And I try to be precise, so it's a bit long (sorry).
Context
In database, I have some strings like "année", "annee", "début", "debut", etc. The table and columns are charset utf8 with collate utf8_general_ci.
I'm using MySQL 5.5.30 and PHP 5.4.13 with PDO initializing with charset utf8 :
$this->dbh = new PDO('mysql:host=' . $this->host . ';dbname=' . $this->base . ';charset=utf8', $this->user, $this->pass);
In mysql console client
If I SELECT without any collate precision, like
AND data = :data
with :data = "année". I will get lines with "année" AND "annee".
Then I SELECT with utf8_bin collate precision, like
AND data = :data COLLATE utf8_bin
with :data = "année", and I retrieve only lines with "année". COOL.
With PHP PDO
If I use same SQL in PHP script, with collate precision, I get this error message :
COLLATION 'utf8_bin' is not valid for CHARACTER SET 'binary'
If I use binary collation, like
AND data = :data COLLATE `binary`
I get the error message
COLLATION 'binary' is not valid for CHARACTER SET 'utf8'
So, my questions are
In PHP script
Precisions
Here are more details (ask in comments).
*Extract from the "show create table"
CREATE TABLE `Request` (
`id` int(8) NOT NULL AUTO_INCREMENT,
`client_id` int(8) DEFAULT NULL,
....
`domain_version_corrective` varchar(20) NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8
*Two sample SELECT (from console, not with PHP / PDO):
SELECT domain_id, domain_version_corrective FROM Request WHERE client_id = 3661 AND domain_version_corrective = 'vèrçion 2.0' ;
which returns
+-----------+---------------------------+
| domain_id | domain_version_corrective |
+-----------+---------------------------+
| FOOBAR | vercion 2.0 |
| FOOBAR | vèrcion 2.0 |
| FOOBAR | verçion 2.0 |
| FOOBAR | vèrçion 2.0 |
| FOOBAR | vèrcion 2.0 |
+-----------+---------------------------+
And
SELECT domain_id, domain_version_corrective FROM Request WHERE client_id = 3661 AND domain_version_corrective = 'vèrçion 2.0' COLLATE utf8_bin;
which returns
+-----------+---------------------------+
| domain_id | domain_version_corrective |
+-----------+---------------------------+
| FOOBAR | vèrçion 2.0 |
+-----------+---------------------------+
*The charsets :
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
I had the same problem and I succeeded by wrapping the parameter with convert(), like so:
data = convert(:data using utf8) collate utf8_bin
It appears that PDO always tags string parameters with character set BINARY by PDO. It would be really nice to find out how to change this.
Also, I'm not sure if it's actually correct to convert the string to UTF8 like this, I suppose technically it could depend on how PHP represents the string internally.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With