I´m rewritting our database class (PDO based), and got stuck at this. I´ve been taught to both use SET NAMES utf8
and SET CHARACTER SET utf8
when working with UTF-8 in PHP and MySQL.
In PDO I now want to use the PDO::MYSQL_ATTR_INIT_COMMAND
parameter, but it only supports one query.
Is SET CHARACTER SET utf8
necessary?
UTF-8 is a character encoding system. It lets you represent characters as ASCII text, while still allowing for international characters, such as Chinese characters.
The Difference Between Unicode and UTF-8 Unicode is a character set. UTF-8 is encoding. Unicode is a list of characters with unique decimal numbers (code points). A = 65, B = 66, C = 67, ....
UTF-8 supports any unicode character, which pragmatically means any natural language (Coptic, Sinhala, Phonecian, Cherokee etc), as well as many non-spoken languages (Music notation, mathematical symbols, APL). The stated objective of the Unicode consortium is to encompass all communications.
Since ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, UTF-8 is safe to use within most programming and document languages that interpret certain ASCII characters in a special way, such as / (slash) in filenames, \ (backslash) in escape sequences, and % in printf.
Using SET CHARACTER SET utf8
after using SET NAMES utf8
will actually reset the character_set_connection
and collation_connection
to@@character_set_database
and @@collation_database
respectively.
The manual states that
SET NAMES x
is equivalent to
SET character_set_client = x; SET character_set_results = x; SET character_set_connection = x;
and SET CHARACTER SET x
is equivalent to
SET character_set_client = x; SET character_set_results = x; SET collation_connection = @@collation_database;
whereas SET collation_connection = x
also internally executes SET character_set_connection = <<character_set_of_collation_x>>
and SET character_set_connection = x
internally also executes SET collation_connection = <<default_collation_of_character_set_x
.
So essentially you're resetting character_set_connection
to @@character_set_database
and collation_connection
to @@collation_database
. The manual explains the usage of these variables:
What character set should the server translate a statement to after receiving it?
For this, the server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8). collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.
To sum this up, the encoding/transcoding procedure MySQL uses to process the query and its results is a multi-step-thing:
character_set_client
.character_set_client
into character_set_connection
character_set_connection
into the character set of the given database column and uses the column collation to do sorting and comparison.character_set_results
(this includes result data as well as result metadata such as column names and so on)So it could be the case that a SET CHARACTER SET utf8
would not be sufficient to provide full UTF-8 support. Think of a default database character set of latin1
and columns defined with utf8
-charset and go through the steps described above. As latin1
cannot cover all the characters that UTF-8 can cover you may lose character information in step 3.
latin1
, these characters will be lost on transcoding from utf8
to latin1
(the default database character set) making your query fail.So I think it's safe to say that SET NAMES ...
is the correct way to handle character set issues. Even though I might add that setting up your MySQL server variables correctly (all the required variables can be set statically in your my.cnf
) frees you from the performance overhead of the extra query required on every connect.
From the mysql manual:
SET CHARACTER SET is similar to SET NAMES but sets
character_set_connection
andcollation_connection
tocharacter_set_database
andcollation_database
. ASET CHARACTER SET x
statement is equivalent to these three statements:SET character_set_client = x; SET character_set_results = x; SET collation_connection = @@collation_database;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With