Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL implicitly coerces to wrong collation in view

In MySQL 5.6, a view appears to be implicitly coercing utf8_general_ci into a latin1_swedish_ci instead of the expected latin1_general_cs.

My setup:

database variables:

mysql> show variables like 'col%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_general_cs |
| collation_database   | latin1_general_cs |
| collation_server     | latin1_general_cs |
+----------------------+-------------------+

mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

Here's my database and table :

CREATE DATABASE `example` /*!40100 DEFAULT CHARACTER SET latin1 COLLATE latin1_general_cs */;

CREATE TABLE `example` (
  `username` varchar(20) COLLATE latin1_general_cs NOT NULL,
  PRIMARY KEY (`username`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_general_cs;

insert into example values ('user_a');

And my view:

create or replace view example_view as
select username
from example
where substring_index(user(), '@', 1) = example.username;

My problem:

When selecting from that view, I get the error:

mysql> select * from example_view;
ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (latin1_general_cs,IMPLICIT) for operation '='

When I run the select statement directly, it works.

As far as I can tell, NOTHING is set to use latin1_swedish_ci. The server, database, table, and column are all set to latin1_general_cs.

Here's what MySQL thinks the collation is for each part:

mysql> select COLLATION(username) as username, 
    -> COLLATION(user()) as user_func, 
    -> COLLATION(substring_index(user(), '@', 1)) as substr_func
    -> from example;
+-------------------+-----------------+-----------------+
| username          | user_func       | substr_func     |
+-------------------+-----------------+-----------------+
| latin1_general_cs | utf8_general_ci | utf8_general_ci |
+-------------------+-----------------+-----------------+

So MySQL is trying to convert from utf8_general_ci to match the latin1_general_cs. But somehow when in the context of a view it decides to use latin1_swedish_ci instead.

I'm aware I can just use convert() but I'd like to avoid that (partly out of curiosity, partly because lots of converts() are going to make for ugly queries).

My Questions:

Why is MySQL converting to latin1_swedish_ci instead of latin1_general_cs? How do I fix that, other than using convert() explicitly in the query?

like image 818
AndyMan1 Avatar asked Jun 02 '14 23:06

AndyMan1


People also ask

What collation should I use for utf8mb4?

If you're using MySQL 8.0, the default charset is utf8mb4. If you elect to use UTF-8 as your collation, always use utf8mb4 (specifically utf8mb4_unicode_ci).

How do I change the default collation in MySQL?

The MySQL server has a compiled-in default character set and collation. To change these defaults, use the --character-set-server and --collation-server options when you start the server. See Section 5.1.


2 Answers

Q: Why is MySQL converting to latin1_swedish_ci instead of latin1_general_cs?

Every characterset has a default collation. You can use the SHOW COLLATION statement to see this. An excerpt from the output shows that latin1_swedish_ci is the default collation for the latin1 characterset:

Collation             Charset       Id  Default  Compiled  Sortlen  
--------------------  --------  ------  -------  --------  ---------
latin1_german1_ci     latin1         5           Yes               1
latin1_swedish_ci     latin1         8  Yes      Yes               1
latin1_bin            latin1        47           Yes               1
latin1_general_ci     latin1        48           Yes               1
latin1_general_cs     latin1        49           Yes               1

We already know that every table has a default characterset and default collation. With the view definition, MySQL is actually creating a table when the query runs.

In the MySQL vernacular, it's called a "derived table".

(As an aside, MySQL does allow some views can be defined with ALGORITHM=MERGE instead of the typical and familiar ALGORITHM=TEMPTABLE. With the MERGE algorithm, we get view handling behavior that's more like the behavior of other relational databases, like Oracle and SQL Server.)

When MySQL creates the derived table, it assigns a characterset along with its the default collation.

That's where the latin1_swedish_ci is coming from... the default collation for latin1.


Q2: How do I fix that, other than using CONVERT() explicitly in the query?

You can try specifying a collation without the CONVERT() function:

CREATE VIEW example_view
AS 
SELECT username COLLATE latin1_general_cs
FROM example
WHERE SUBSTRING_INDEX(USER(), '@', 1) COLLATE latin1_general_cs = example.username;

(If your client characterset is utf8, then you're likely to encounter an error with that syntax, if you don't also have the CONVERT(... USING ...). You can use COLLATE in conjunction with the CONVERT() function.

  CONVERT(USER() USING latin1) COLLATE latin1_general_cs

NOTE: I don't have any practical experience with stored views; we use inline views all over the place. But we never create stored views, because stored views cause a myriad of problems, way bigger and way more problems than whatever problems the view definition was a solution for.

like image 168
spencer7593 Avatar answered Oct 05 '22 05:10

spencer7593


Not sure what exactly you are asking for, but just to avoid the error message you can:

http://sqlfiddle.com/#!9/2697e/3

create or replace view example_view   as
select username 
from example
where substring_index(user(), '@', 1)  = example.username COLLATE latin1_general_cs;

http://sqlfiddle.com/#!9/bf88d/1

like image 38
Alex Avatar answered Oct 05 '22 06:10

Alex