I'm storing Unicode strings in a MySQL database with a Django web application. I can store Unicode data fine, but when querying, I find that é
and e
are treated as if they were the same character:
In [1]: User.objects.filter(last_name='Jildén')
Out[1]: [<User: Anders Jildén>]
In [2]: User.objects.filter(last_name='Jilden')
Out[2]: [<User: Anders Jildén>]
This is also the case when using the MySQL shell directly:
mysql> select last_name from auth_user where last_name = 'Jildén';
+-----------+
| last_name |
+-----------+
| Jildén |
+-----------+
1 row in set (0.00 sec)
mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén |
+-----------+
1 row in set (0.01 sec)
Here are the database charset settings:
mysql> SHOW variables LIKE '%character_set%';
+--------------------------+------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/Cellar/mysql/5.1.54/share/mysql/charsets/ |
+--------------------------+------------------------------------------------------+
here's the table schema:
CREATE TABLE `auth_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(30) CHARACTER SET utf8 NOT NULL,
`first_name` varchar(30) CHARACTER SET utf8 NOT NULL,
`last_name` varchar(30) CHARACTER SET utf8 NOT NULL,
`email` varchar(200) CHARACTER SET utf8 NOT NULL,
`password` varchar(128) CHARACTER SET utf8 NOT NULL,
`is_staff` tinyint(1) NOT NULL,
`is_active` tinyint(1) NOT NULL,
`is_superuser` tinyint(1) NOT NULL,
`last_login` datetime NOT NULL,
`date_joined` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `username` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=7952 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
and here are the options I'm passing via Django's DATABASES
setting:
DATABASES = {
'default': {
# ...
'OPTIONS': {
'charset': 'utf8',
'init_command': 'SET storage_engine=INNODB;',
},
},
}
Note that I have tried setting the table collation to utf8_bin
, with no effect:
mysql> alter table auth_user collate utf8_bin;
mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén |
+-----------+
1 row in set (0.00 sec)
How can I get MySQL to treat these as different characters?
By default, it depends on the operating system and its case sensitivity. This means MySQL is case-insensitive in Windows and macOS, while it is case-sensitive in most Linux systems. However, you can change the behavior by changing collation.
Table names are stored in lowercase on disk and name comparisons are not case-sensitive. MySQL converts all table names to lowercase on storage and lookup. This behavior also applies to database names and table aliases.
utf-8 can store only 1, 2 or 3 bytes characters, while utf8mb4 can store 4 bytes characters as well. utf-8 is a subset of characters given by utf8mb4 .
If you're using MySQL 8.0, the default charset is utf8mb4. If you elect to use UTF-8 as your collation, always use utf8mb4 (specifically utf8mb4_unicode_ci). You should not use UTF-8 because MySQL's UTF-8 is different from proper UTF-8 encoding.
You were nearly there when you changed the table collation, but not quite. In MySQL, each column in a table has its own character set and collation. The table has its own character set and collation, but this does not override the column collations; it only determines what the collation will be for new columns that are added for which you don't specify the collation. So you haven't changed the collation of the column that you're interested in.
ALTER TABLE tablename MODIFY columnname
varchar(???) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
You need to set a collation that treats diacritics as significant. Try using utf8_bin
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With