Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is MySQL treating é the same as e?

I'm storing Unicode strings in a MySQL database with a Django web application. I can store Unicode data fine, but when querying, I find that é and e are treated as if they were the same character:

In [1]: User.objects.filter(last_name='Jildén')
Out[1]: [<User: Anders Jildén>]

In [2]: User.objects.filter(last_name='Jilden')
Out[2]: [<User: Anders Jildén>]

This is also the case when using the MySQL shell directly:

mysql> select last_name from auth_user where last_name = 'Jildén';
+-----------+
| last_name |
+-----------+
| Jildén   |
+-----------+
1 row in set (0.00 sec)

mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén   |
+-----------+
1 row in set (0.01 sec)

Here are the database charset settings:

mysql> SHOW variables LIKE '%character_set%';
+--------------------------+------------------------------------------------------+
| Variable_name            | Value                                                |
+--------------------------+------------------------------------------------------+
| character_set_client     | latin1                                               |
| character_set_connection | latin1                                               |
| character_set_database   | utf8                                                 |
| character_set_filesystem | binary                                               |
| character_set_results    | latin1                                               |
| character_set_server     | latin1                                               |
| character_set_system     | utf8                                                 |
| character_sets_dir       | /usr/local/Cellar/mysql/5.1.54/share/mysql/charsets/ |
+--------------------------+------------------------------------------------------+

here's the table schema:

CREATE TABLE `auth_user` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `username` varchar(30) CHARACTER SET utf8 NOT NULL,
    `first_name` varchar(30) CHARACTER SET utf8 NOT NULL,
    `last_name` varchar(30) CHARACTER SET utf8 NOT NULL,
    `email` varchar(200) CHARACTER SET utf8 NOT NULL,
    `password` varchar(128) CHARACTER SET utf8 NOT NULL,
    `is_staff` tinyint(1) NOT NULL,
    `is_active` tinyint(1) NOT NULL,
    `is_superuser` tinyint(1) NOT NULL,
    `last_login` datetime NOT NULL,
    `date_joined` datetime NOT NULL,
    PRIMARY KEY (`id`),
    UNIQUE KEY `username` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=7952 DEFAULT CHARSET=utf8 COLLATE=utf8_bin

and here are the options I'm passing via Django's DATABASES setting:

DATABASES = {
    'default': {
        # ...
        'OPTIONS': {
            'charset': 'utf8',
            'init_command': 'SET storage_engine=INNODB;',
        },
    },
}

Note that I have tried setting the table collation to utf8_bin, with no effect:

mysql> alter table auth_user collate utf8_bin;

mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén   |
+-----------+
1 row in set (0.00 sec)

How can I get MySQL to treat these as different characters?

like image 645
claymation Avatar asked Aug 02 '11 23:08

claymation


People also ask

Is MySQL like case-sensitive?

By default, it depends on the operating system and its case sensitivity. This means MySQL is case-insensitive in Windows and macOS, while it is case-sensitive in most Linux systems. However, you can change the behavior by changing collation.

Is MySQL case-insensitive?

Table names are stored in lowercase on disk and name comparisons are not case-sensitive. MySQL converts all table names to lowercase on storage and lookup. This behavior also applies to database names and table aliases.

What is the difference between utf8mb4 and utf8 charset in MySQL?

utf-8 can store only 1, 2 or 3 bytes characters, while utf8mb4 can store 4 bytes characters as well. utf-8 is a subset of characters given by utf8mb4 .

What character set should I use MySQL?

If you're using MySQL 8.0, the default charset is utf8mb4. If you elect to use UTF-8 as your collation, always use utf8mb4 (specifically utf8mb4_unicode_ci). You should not use UTF-8 because MySQL's UTF-8 is different from proper UTF-8 encoding.


2 Answers

You were nearly there when you changed the table collation, but not quite. In MySQL, each column in a table has its own character set and collation. The table has its own character set and collation, but this does not override the column collations; it only determines what the collation will be for new columns that are added for which you don't specify the collation. So you haven't changed the collation of the column that you're interested in.

ALTER TABLE tablename MODIFY columnname
    varchar(???) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
like image 200
Hammerite Avatar answered Oct 07 '22 03:10

Hammerite


You need to set a collation that treats diacritics as significant. Try using utf8_bin

like image 26
Ted Hopp Avatar answered Oct 07 '22 02:10

Ted Hopp