Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting mysql tables from latin1 to utf8

I'm trying to convert some mysql tables from latin1 to utf8. I'm using the following command, which seems to mostly work.

ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

However, on one table I get an error about a duplicate key entry. This is caused by a unique index on a "name" field. It seems when converting to utf8, any "special" characters are indexed as their straight english equivalent. For example, there is already a record with a name field value of "Dru". When converting to utf8, a record with "Drü" is considered a duplicate. The same with "Patrick" and "Påtrìçk".

Here is how to reproduce the issue:

CREATE TABLE `example` (   `name` char(20) CHARACTER SET latin1 NOT NULL,
  PRIMARY KEY (`name`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO example (name) VALUES ('Drü'),('Dru'),('Patrick'),('Påtrìçk');

ALTER TABLE example convert to character set utf8 collate utf8_general_ci;
ERROR 1062 (23000): Duplicate entry 'Dru' for key 1
like image 277
Brent Baisley Avatar asked Nov 01 '10 16:11

Brent Baisley


2 Answers

The reason why the strings 'Drü' and 'Dru' evaluate as the same is that in the utf8_general_ci collation, they count as "the same". The purpose of a collation for a character set is to provide a set of rules as to when strings are the same, when one sorts before the other, and so on.

If you want a different set of comparison rules, you need to choose a different collation. You can see the available collations for the utf8 character set by issuing SHOW COLLATION LIKE 'utf8%'. There are a bunch of collations intended for text that is mostly in a specific language; there is also the utf8_bin collation which compares all strings as binary strings (i.e. compares them as sequences of 0s and 1s).

like image 156
Hammerite Avatar answered Oct 08 '22 23:10

Hammerite


UTF8_GENERAL_CI is accent insensitive.

Use UTF8_BIN or a language-specific collation.

like image 27
Quassnoi Avatar answered Oct 08 '22 22:10

Quassnoi