Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP, PDO, UTF8 and MySQL not playing ball

Tags:

php

mysql

utf-8

pdo

I'm really struggling trying to get PHP to play ball with MySQL through PDO with regards to UTF-8 characters.

I get the following using php --version:

PHP 5.3.6 (cli) (built: Apr 19 2011 13:21:12)
Copyright (c) 1997-2011 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2011 Zend Technologies

And this is the welcome message from mysql cli:

Server version: 5.1.57 Source distribution

I have this as my database connection code:

try {
    $dbh = new PDO('mysql:host='.$server.';dbname='.$database.';charset=UTF8', $user, $password, array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'UTF8'"));
} catch (PDOException $e) {
    print "Error!: " . $e->getMessage() . "<br/>";
    die();
}

And my php code to insert into MySQL:

$dbh->query("SET NAMES 'UTF8'"); 
$values = array(':idCode' => $idCode, ':name' => $name, ':address' => $address, ':postcode' => $postcode, ':tel' => $tel, ':website' => $website, ':disabled' => $disabled, ':disabledWC' => $disabledWC, ':description' => $description, ':location' => $location, ':type' => $type, ':status' => $status, ':image1name' => $image1name, ':image2name' => $image2name, ':image3name' => $image3name, ':image4name' => $image4name, ':image5name' => $image5name);
$preparedStatement = $dbh->prepare('UPDATE venues SET name = :name, address = :address, postcode = :postcode, tel = :tel, website = :website, disabled = :disabled, disabledWC = :disabledWC, description = :description, location = :location, type = :type, status = :status, date_modified = NOW(), image1name = :image1name, image2name = :image2name, image3name = :image3name, image4name = :image4name, image5name = :image5name WHERE idCode = :idCode');
$preparedStatement->execute($values);

The problem I'm getting is that even though I've verified that $description does indeed contain UTF8 encoded bytes (Using bin2hex and a hex2UTF8 tool on the web) the data does not get submitted through the connection as UTF-8 and I end up with "Générik" instead of "Générik" in the database.

I have tried putting in a call to "SET NAMES UTF8" prior to executing, I've put in:

[client]
default-character-set=utf8

[mysqld]
default-character-set=utf8

in my my.cnf but can't seem to get any joy.

Any hints or tips on how to get this to work?

I should add - if I enter the text manually in the command line MySQL client it works fine, the column is set to take UTF8 encoding.

As requested - Create Table Script:

| venues | CREATE TABLE `venues` (
  `idCode` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(512) DEFAULT NULL,
  `address` varchar(1024) DEFAULT NULL,
  `postcode` varchar(32) DEFAULT NULL,
  `tel` varchar(32) DEFAULT NULL,
  `website` varchar(1024) DEFAULT NULL,
  `disabled` tinyint(4) DEFAULT NULL,
  `disabledWC` tinyint(4) DEFAULT NULL,
  `description` varchar(4096) CHARACTER SET utf8 DEFAULT NULL,
  `status` tinyint(4) DEFAULT NULL,
  `location` varchar(32) DEFAULT NULL,
  `type` varchar(32) DEFAULT NULL,
  `date_modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `image1name` varchar(255) DEFAULT NULL,
  `image2name` varchar(255) DEFAULT NULL,
  `image3name` varchar(255) DEFAULT NULL,
  `image4name` varchar(255) DEFAULT NULL,
  `image5name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`idCode`)
) ENGINE=MyISAM AUTO_INCREMENT=129 DEFAULT CHARSET=latin1

And hte result of SELECT @@character_set_database:

SELECT @@character_set_database
    -> ;
+--------------------------+
| @@character_set_database |
+--------------------------+
| utf8                     |
+--------------------------+
1 row in set (0.00 sec)

And the results of dumpParams:

SQL: [410] UPDATE venues SET name = :name, address = :address, postcode = :postcode, tel = :tel, website = :website, disabled = :disabled, disabledWC = :disabledWC, description = :description, location = :location, type = :type, status = :status, date_modified = NOW(), image1name = :image1name, image2name = :image2name, image3name = :image3name, image4name = :image4name, image5name = :image5name WHERE idCode = :idCode
Params:  17
Key: Name: [7] :idCode
paramno=-1
name=[7] ":idCode"
is_param=1
param_type=2
Key: Name: [5] :name
paramno=-1
name=[5] ":name"
is_param=1
param_type=2
Key: Name: [8] :address
paramno=-1
name=[8] ":address"
is_param=1
param_type=2
Key: Name: [9] :postcode
paramno=-1
name=[9] ":postcode"
is_param=1
param_type=2
Key: Name: [4] :tel
paramno=-1
name=[4] ":tel"
is_param=1
param_type=2
Key: Name: [8] :website
paramno=-1
name=[8] ":website"
is_param=1
param_type=2
Key: Name: [9] :disabled
paramno=-1
name=[9] ":disabled"
is_param=1
param_type=2
Key: Name: [11] :disabledWC
paramno=-1
name=[11] ":disabledWC"
is_param=1
param_type=2
Key: Name: [12] :description
paramno=-1
name=[12] ":description"
is_param=1
param_type=2
Key: Name: [9] :location
paramno=-1
name=[9] ":location"
is_param=1
param_type=2
Key: Name: [5] :type
paramno=-1
name=[5] ":type"
is_param=1
param_type=2
Key: Name: [7] :status
paramno=-1
name=[7] ":status"
is_param=1
param_type=2
Key: Name: [11] :image1name
paramno=-1
name=[11] ":image1name"
is_param=1
param_type=2
Key: Name: [11] :image2name
paramno=-1
name=[11] ":image2name"
is_param=1
param_type=2
Key: Name: [11] :image3name
paramno=-1
name=[11] ":image3name"
is_param=1
param_type=2
Key: Name: [11] :image4name
paramno=-1
name=[11] ":image4name"
is_param=1
param_type=2
Key: Name: [11] :image5name
paramno=-1
name=[11] ":image5name"
is_param=1
param_type=2

Logged hex bytes (and expected string value) of $description:

 Desctription: 526f756e64696e67206f6666205361747572646179206e6967687420696e20756e666f726765747461626c65207374796c652c204672656e636820636f6d70616e792047c3a96ec3a972696b20566170657572206272696e67207468656972206e65772073686f772057617465726c69747a20746f207468652048617420466169722e204174204b696e672047656f726765205620506c6179696e67204669656c64732c2042617220456e642e: Rounding off Saturday night in unforgettable style, French company Générik Vapeur bring their new show Waterlitz to the Hat Fair. At King George V Playing Fields, Bar End.
like image 694
Matt Fellows Avatar asked May 28 '12 11:05

Matt Fellows


3 Answers

  try {
      $dbh = new PDO(
          'mysql:host='.$server.';dbname='.$database, 
          $user, 
          $password,
          array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'UTF8'")
      );
      $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
  } catch (PDOException $e) {
      print "Error!: " . $e->getMessage() . "<br/>";
      die();
  }

Don't encode data before puting them to the db, just pure raw utf-8 data send. Should be enough.

Regards.

like image 138
cojack Avatar answered Nov 18 '22 10:11

cojack


For the word Générik, your "logged hex bytes" are 0x47c3a96ec3a972696b. This is indeed UTF-8 encoded. The client with which you are attempting to verify your stored data is almost certainly setting the wrong character set prior to fetching the table contents.

like image 43
eggyal Avatar answered Nov 18 '22 11:11

eggyal


I had the same problem, and I figured it out with help from this post. I was feeding perfectly valid umlauts into the DB, but it was storing junk.

I basically just added this: array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'") after my connect/init password, thusly:

$this->db = new PDO($this->dsn, $this->username, $this->password, 
                    array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'"));
like image 2
Chris Marshall Avatar answered Nov 18 '22 09:11

Chris Marshall