Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL distinction between e and é (e acute) - UNIQUE index

I have a table, students, with 3 columns: id, name, and age. I have a UNIQUE index Index_2 on columns name and age.

CREATE TABLE `bedrock`.`students` (  
    `id` INTEGER UNSIGNED NOT NULL
    AUTO_INCREMENT,   `name` VARCHAR(45)
    NOT NULL,   `age` INTEGER UNSIGNED NOT
    NULL,   PRIMARY KEY (`id`),   UNIQUE
    INDEX `Index_2` USING BTREE(`name`,
    `age`) ) ENGINE = InnoDB;

I tried this insert option:

insert into students (id, name, age)
values (1, 'Ane', 23);

which works ok. Than I've tried this one (see Ané - e acute):

insert into students (id, name, age)
values (2, 'Ané', 23);

and I receive this error message:

"Duplicate entry 'Ané-23' for key 'Index_2'"

MySQL somehow does not make any distinction between "Ane" and "Ané". How I can resolve this and why this is happening?

Charset for table students is "utf8" and collation is "utf8_general_ci".

ALTER TABLE `students` CHARACTER SET utf8 COLLATE utf8_general_ci;

Later edit1: @Crozin:

I've changed to use collation utf8_bin:

ALTER TABLE `students`
CHARACTER SET utf8 COLLATE utf8_bin;

but I receive the same error.

But if I create the table from start with charset utf8 and collation utf8_bin, like this:

CREATE TABLE `students2` ( 
`id` INTEGER UNSIGNED AUTO_INCREMENT, 
`name` VARCHAR(45),   `age`
VARCHAR(45),   PRIMARY KEY (`id`),  
UNIQUE INDEX `Index_2` USING
BTREE(`name`, `age`) ) ENGINE = InnoDB
CHARACTER SET utf8 COLLATE utf8_bin;

both below insert commands works ok:

insert into students2 (id, name, age)
values (1, 'Ane', 23); // works ok

insert into students2 (id, name, age)
values (2, 'Ané', 23); // works ok

This seems to be very weird.

Later edit 2:

I saw another answer here. I'm not sure if the user deleted or it get lost. I was just testing it:

The user wrote that first he created 3 tables with 3 different charsets:

CREATE TABLE `utf8_bin` (   `id`
int(10) unsigned NOT NULL
AUTO_INCREMENT,   `name` varchar(45)
COLLATE utf8_bin NOT NULL,   `age`
int(10) unsigned NOT NULL,   PRIMARY
KEY (`id`),   UNIQUE KEY `Index_2`
(`name`,`age`) USING BTREE )
ENGINE=InnoDB DEFAULT CHARSET=utf8
COLLATE=utf8_bin;

CREATE TABLE `utf8_unicode_ci` (  
`id` int(10) unsigned NOT NULL
AUTO_INCREMENT,   `name` varchar(45)
COLLATE utf8_unicode_ci NOT NULL,  
`age` int(10) unsigned NOT NULL,  
PRIMARY KEY (`id`),   UNIQUE KEY
`Index_2` (`name`,`age`) USING BTREE )
ENGINE=InnoDB DEFAULT CHARSET=utf8
COLLATE=utf8_unicode_ci;

CREATE TABLE `utf8_general_ci` (  
`id` int(10) unsigned NOT NULL
AUTO_INCREMENT,   `name` varchar(45)
COLLATE utf8_general_ci NOT NULL,  
`age` int(10) unsigned NOT NULL,  
PRIMARY KEY (`id`),   UNIQUE KEY
`Index_2` (`name`,`age`) USING BTREE )
ENGINE=InnoDB DEFAULT CHARSET=utf8
COLLATE=utf8_general_ci;

The results of the user are:

Insert commands: INSERT INTO utf8_bin
VALUES (1, 'Ane', 23), (2, 'Ané', 23);
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0

INSERT INTO utf8_unicode_ci VALUES (1,
'Ane', 23), (2, 'Ané', 23); Query OK,
2 rows affected (0.01 sec) Records: 2 
Duplicates: 0  Warnings: 0

INSERT INTO utf8_general_ci VALUES (1,
'Ane', 23), (2, 'Ané', 23); Query OK,
2 rows affected (0.01 sec) Records: 2 
Duplicates: 0  Warnings: 0

Here are my results:

INSERT INTO utf8_bin VALUES (1, 'Ane',
23), (2, 'Ané', 23);        //works ok
INSERT INTO utf8_unicode_ci VALUES (1,
'Ane', 23), (2, 'Ané', 23); //
Duplicate entry 'Ané-23' for key
'Index_2'

INSERT INTO utf8_general_ci VALUES (1,
'Ane', 23), (2, 'Ané', 23);
//Duplicate entry 'Ané-23' for key
'Index_2'

I'm not sure why in his part this INSERT command worked and for me doesn't work.

He also wrote that he tested this on Mysql on Linux - has to do something with this?! Even I do not think so.

like image 284
Paul Avatar asked Jun 24 '11 11:06

Paul


People also ask

What is utf8_ general_ ci in MySQL?

utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters. MySQL implements utf8 language-specific collations if the ordering with utf8_unicode_ci does not work well for a language.

What is unique index in MySQL?

Indexing is a process to find an unordered list into an ordered list that allows us to retrieve records faster. It creates an entry for each value that appears in the index columns. It helps in maximizing the query's efficiency while searching on tables in MySQL.

Does unique index improve performance MySQL?

In addition to enforcing the uniqueness of data values, a unique index can also be used to improve data retrieval performance during query processing. Non-unique indexes are not used to enforce constraints on the tables with which they are associated.

Are unique columns indexed?

When you specify UNIQUE KEY , the column is indexed. So it has no difference in performance with other indexed column (e.g. PRIMARY KEY) of same type.


1 Answers

and collation is "utf8_general_ci".

And that's the answer. If you're using utf8_general_ci (actually it applies to all utf_..._[ci|cs]) collation then diacritics are bypassed in comarison, thus:

SELECT "e" = "é" AND "O" = "Ó" AND "ä" = "a"

Results in 1. Indexes also use collation.

If you want to distinguish between ą and a then use utf8_bin collation (keep in mind that it also distinguish between uppercase and lowercase characters).


By the way name and age don't guarantee any uniqueness.

like image 96
Crozin Avatar answered Oct 25 '22 14:10

Crozin