Alright, I give up. I'm 2 days into this error and I need help.
Disclaimer: I will need help improving this question and will try to do a good job of describing the issue at hand, what I have done so far to address the issue, and share blog posts and the documentation that I have read in search of a solution.
The Question (also, asked below in context):
So the question is, why does the same query behave differently when run from Rails instead of from the mysql command line? Specifically, where is "(utf8_general_ci,COERCIBLE)" coming from?
The Problem: Autoresponder.find_by(keyword: 'π')
fails with the following error:
ActiveRecord::StatementInvalid: Mysql2::Error: Illegal mix of collations (utf8mb4_unicode_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=': SELECT `autoresponders`.* FROM `autoresponders` WHERE `autoresponders`.`keyword` = 'π' LIMIT 1
Autoresponder
is a model that has the attribute keyword
I read that I needed to specify my collation. So I tested out the following code:
Autoresponder.where('keyword collate utf8mb4_unicode_ci = ?', 'π')
and got the following error:
Illegal mix of collations (utf8mb4_unicode_ci,EXPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=': SELECT `autoresponders`.* FROM `autoresponders` WHERE (keyword collate utf8mb4_unicode_ci = 'π')
All that did was change the collation from IMPLICIT to EXPLICIT.
I tried running the query in Sequel Pro and it worked (with and without the collate keyword). For clarity's sake here's the query:
SELECT `autoresponders`.* FROM `autoresponders` WHERE (keyword collate utf8mb4_unicode_ci = 'π'); SELECT `autoresponders`.* FROM `autoresponders` WHERE (keyword = 'π ');
AND it works! The queries ran without errors. I also ran mysql
and was able to run the query there too. But I noticed something when I pasted the query into the mysql
command line. It automatically used the Unicode name for the character instead of the actual character. Here is the query as observed in the mysql command line:
SELECT `autoresponders`.* FROM `autoresponders` WHERE (keyword collate utf8mb4_unicode_ci ='\U+1F615');
This query works.
So the question is, why does the same query fail in Rails but work in Sequel Pro? Specifically, where is "(utf8_general_ci,COERCIBLE)" coming from, and how do I fix this mess?
I thought that it might be coming from ActiveRecord, but running ActiveRecord::Base.connection.collation
in rails console returns utf8mb4_unicode_ci
Here are my db character encoding and collation variables (and the query that retrieved them).
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; character_set_client utf8mb4 character_set_connection utf8mb4 character_set_database utf8mb4 character_set_filesystem binary character_set_results utf8mb4 character_set_server latin1 character_set_system utf8 collation_connection utf8mb4_unicode_ci collation_database utf8mb4_unicode_ci collation_server latin1_swedish_ci
Here is the create syntax for the Autorsponders table:
CREATE TABLE `autoresponders` ( `id` int(11) NOT NULL AUTO_INCREMENT, `keyword` varchar(191) COLLATE utf8mb4_unicode_ci DEFAULT '', `body` varchar(191) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `created_at` datetime DEFAULT NULL, `updated_at` datetime DEFAULT NULL, `provisioned_number_id` int(11) DEFAULT NULL, `outgoing_provisioned_number_id` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC;
Context: Rails 4.0.13, Mysql version 5.6.22-1+deb.sury.org~precise+1-log
Here are some blog posts and SO articles that I have read so far: https://mathiasbynens.be/notes/mysql-utf8mb4
http://airbladesoftware.com/notes/fixing-mysql-illegal-mix-of-collations/
Is "SET CHARACTER SET utf8" necessary?
Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and (utf8_general_ci,IMPLICIT) for operation '='
Not case sensitive search with active record
https://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_collation_server
All of this led me to create this meme:
Sincerely,
An exhausted fellow developer.
Thanks.
utf8mb4_unicode_ci is based on the official Unicode rules for universal sorting and comparison, which sorts accurately in a wide range of languages. utf8mb4_general_ci is a simplified set of sorting rules which aims to do as well as it can while taking many short-cuts designed to improve speed.
utf8_bin is case-sensitive because it compares the binary values of the character.
What is the meaning of the MySQL collation utf8mb4_0900_ai_ci? utf8mb4 means that each character is stored as a maximum of 4 bytes in the UTF-8 encoding scheme.
I met similar problem and solved finally. At first, my MySQL conf isοΌ
character-set-server = utf8 collation-server = utf8_general_ci
One day, i found that emoji could be saved correctly only with utf8mb4, so i change the specified column's character set and collation like this:
`nickname` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
By now, everything still fine, data could be saved and display correctly by java web application.
but when i query data like
SELECT * FROM table_name WHERE nickname LIKE '%π%';
error raises.
Finally i changed mysql conf to
character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci
and everything goes fine.
Make sure also that any call of SET NAMES utf8;
is removed or replaced by SET NAMES utf8mb4
Here is a screenshot of mysql client, notice the nickname attribute
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With