Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

character set between PHP and MySQL

Tags:

php

mysql

pdo

I am a little confused right now I have a PDO connection with charset=utf8 and the DB uses latin.

What does this mean ?

My thought is it means that every connection done by PHP either sending or receiving from the DB is encoded to utf8. However I read a lot that the DB too should using the same charset as the PHP .

Can anyone please explains in details the role of the character set in PHP and in MySQL DB exactly and what's the benefit of aligning them?

like image 848
Sameh Avatar asked Dec 25 '15 09:12

Sameh


1 Answers

Say PHP sends some text to MySQL to be stored, something like

INSERT INTO `some_table` ("foo") VALUES 
('The quick brown fox jumps over the lazy dog');

The basic intent of this query is obviously to tell MySQL to store the string The quick brown fox jumps over the lazy dog into the database.

If PHP is configured to use UTF-8, it means that when it converts the human readable characters to binary - in order to transmit it to MySQL - it will convert the characters using the UTF-8 encoding system.
MySQL can read characters encoded in UTF-8 and so it has no problems understanding that the digitally encoded sequence is meant to mean T and h and e etc - in human readable characters.
If MySQL is configured to store data in the some_table table using latin1, when it receives the string, it will convert the characters from their UTF-8 encodings to the latin1 equivalents prior to saving the data to harddisk.
In this case there is no problem - because the english alphabet characters can be represented by both UTF-8 and Latin1.
However, problems occur if the string PHP sent contained characters that can only be represented by UTF-8 and not Latin1, e.g. a smart quote . When MySQL tries to convert the smart quote into a digital form, it won't be able to - because the Latin1 literally has no digital encoding defined to represent .
I'm not sure what MySQL's exact error management process is when it encounters this situation, and whether the situation is recoverable, but generally the end result is that the underlying encoding will corrupted and unusable.
Because this problem only occurs for those characters which cannot be represented by the two systems - and if 99% of all your communications involve english characters, you may not notice a problem for quite a while and even then it will only be the occasional character, but trying to recover when you do notice problems could be frustrating.

like image 72
the_velour_fog Avatar answered Oct 07 '22 20:10

the_velour_fog