Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the purpose of character_set_connection?

Just read Stefan Gehrig excellent answer to Is "SET CHARACTER SET utf8" necessary?, which goes a bit further than MySQL's documentation at explaining the stages of interpretting and running a query w.r.t. character sets and collations, but I still can't really see the purpose of character_set_connection, or more specifically transcoding the statement from character_set_client into character_set_connection.

Why not just use character_set_client for the query and transcode straight from character_set_client to the character set of the column when comparing with column values? What is the purpose of this intermediate stage? The manual gives the example of comparing literal stings, but why would you want to do this in the first place, let alone in character_set_connection as oppose to character_set_client? Unless my understanding of this (something like "select 'somestr' = 'somestr' from x") is wrong.

Thank you.

like image 956
lm713 Avatar asked Apr 18 '13 12:04

lm713


People also ask

What is Character_set_database?

The character_set_database and collation_database system variables indicate the character set and collation of the default database.

What is utf8mb4?

utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead.

What is MySQL collation?

A collation is a set of rules that defines how to compare and sort character strings. Each collation in MySQL belongs to a single character set. Every character set has at least one collation, and most have two or more collations. A collation orders characters based on weights.

What is the default collation for MySQL?

The default MySQL server character set and collation are latin1 and latin1_swedish_ci , but you can specify character sets at the server, database, table, column, and string literal levels.


2 Answers

After reading the answers and documentation, I can only think of one use case for character_set_connection (and _collation):

SELECT "StringA" < "StringB"

character_set_client only matters for the transfer to the server. character_set_connection (and the collation, which is not independent from the character set) matters for the interpretation of the statement. Whether "StringA" is less than "StringB" depends on the character set and collation of the literals. A developer might choose a character set/collation which differs from character_set_client.

In practice, character_set_connection won't matter most of the time, because literals are compared to columns, in which case the column's charset and collation is used.

Correct me if I'm wrong!

See https://dev.mysql.com/doc/refman/5.0/en/charset-connection.html:

What character set should the server translate a statement to after receiving it? For this, the server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8). collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.

like image 57
Christian Avatar answered Sep 29 '22 18:09

Christian


The two are different in that character_set_client is assumed to be the charset the statement is sent from the client in, and therefore the charset the server uses to interpret the statement, while character_set_connection is what the server converts the statement into for processing.

character_set_connection is used, as discussed, for comparison of literal strings. This does not necessary mean that both sides of the equation must be literal strings, however. Eg:

WHERE column_name = 'literal_string'
     (charset col)  (charset connection)

If the character sets of the column and the connection are different, the comparison is illegal and will cause an error.

The results (and and response messages) are then encoded into character_set_results for sending back to the client.

like image 26
Hearth Avatar answered Sep 29 '22 18:09

Hearth